Matthias Buchmeier
Welcome!
Hello, and welcome to Wiktionary. Thank you for your contributions. I hope you like the place and decide to stay. Here are a few good links for newcomers:
- Wiktionary Tutorial
- How to edit a page
- How to start a page
- Our format guidelines
- Criteria for inclusion
- Wiktionary Sandbox (a safe place for testing syntax)
- What Wiktionary is not
- FAQ
I hope you enjoy editing here and being a Wiktionarian! By the way, you can sign your name on Talk (discussion) and vote pages using four tildes, like this: ~~~~, which automatically produces your name and the current date. If you have any questions, see the help pages, add a question to one of the discussion rooms or ask me on my Talk page. Again, welcome!
— This unsigned comment was added by Widsith (talk • contribs) at 11:38, 13 June 2007 (UTC).
Etymology templates
editHi there,
Thanks for all the Spanish words you've been adding!
I just wanted to let you know about our etymology templates. For example, instead of writing "Latin", we write "{{L.}}", which links to the Wikipedia article on Latin and also adds the entry to Category:Latin derivations.
Thanks again! :-)
—RuakhTALK 19:35, 15 June 2007 (UTC)
- A clarification of Ruakh's comments: When writing an etymology for a non-English word, the language code should be put as a parameter of the template. So, for example, if a Spanish word derives from Latin, one would input {{L.|es}} instead of writing "Latin". Thanks. Atelaes 19:42, 15 June 2007 (UTC)
Accented forms
editFor Spanish, we keep separate entries for accented and unaccented forms. So there should be an entry for quien and an entry for quién. Each page should explsin how that spelling is used in Spanish, and each should link to the other page at the top using a {{see}}
template. Also, on the list of frequent Spanish words, the links should point to the exactly matching form, not to a root form. --EncycloPetey 20:21, 21 June 2007 (UTC)
- I don't think I understand your question. Here on the English Wiktionary, we have an entry for every possible form of a word, and we do link between forms that come from the same lemma. For example, we have the lemma page blanco, but we also have pages for blanca, blancos, and blancas. We have the lemma page ganar, but we also have pages for gano, ganan, ganaba, gané, etc. Notice that the non-lemma pages explain how they are related to the lemma form rather than give an English translation. A lemma page has all the definitions, inflection tables, and so on. The non-lemma pages may have quotations and a pronunciation, but otherwise give only an explanation of how it is formed from the lemma, with a link to the lemma.
- Yes, adding a separate column to list the lemma form would be a good idea. Notice that we use the term "lemma" instead of "root" because root has a different meaning as a header. It is used for Hebrew consonant combinations from which many words can be created by the addition of a set of vowels. --EncycloPetey 17:18, 22 June 2007 (UTC)
Telefonar
editI did a google search and a double check on my entry.
It seems that through our spanish translation project a portugeuse word showed up. I kinda overlooked one letter, but we can fix that problem really easy.
Language templates
editPlease don't add language templates like {{it}}
or {{es}}
to pages. We don;t use them on the English Wiktionary. The only reason those templates exist here at all is so that we can subst them when they get added. --EncycloPetey 08:39, 6 July 2007 (UTC)
- We keep the templates around for two reasons. First, as you noticed, some other wiktionaries use the ISO language code templates. Editors from those projects occasionally insert the codes here, and we want to be able to subst them. Second, we use them to coordinate bot verification of language section headers. I believe they may also be used in some of the other context templates, but I'm not certain about that. We need to have them around. If you want to use them, please subst them when you put them in. There is a bot that occasionally does this, but I'm not sure whether AutoFormat soes that of if it's another bot that hasn't been run for awhile. --EncycloPetey 09:31, 6 July 2007 (UTC)
Oh, and on an unrelated issue, could you check the German translations on the page Appendix:Official languages of the European Union matrix ? Thanks. --EncycloPetey 08:41, 6 July 2007 (UTC)
Related terms
editPlease note that the Related terms section is only for words related by etymology, not simply for words associated by meaning. So clean, cleaner, cleaning, and cleanliness are related because they come from the same root word, but tidy, neat, and straighten would not be related. --EncycloPetey 08:49, 6 July 2007 (UTC)
- It depends. If they have nearly the same meaning, use Synonyms. If they aren't words with the same meaning, but a closely related meaning, they can go under See also, but that should be limited only for one or two key words needed to understand the definition. We tend to discourage creating lists of words in the normal entry namespace except under the standard headers (Synonyms, Antonyms, etc). --EncycloPetey 09:28, 6 July 2007 (UTC)
What is the reason for commenting the Spanish translations for pancake? I think hot cake is quite unknown among Spanish speakers and does not even have a RAE entry, neither a wiktionary yet.Matthias Buchmeier 09:54, 6 July 2007 (UTC)
- The translations and pronunciations belong on the entry pages for the translations, not in the Translations section for the page pancake. The Translations section is only for translations, with gender and transliteration when necessary. It is not for a complete explanation. See Entry Layout Explained for more information. --EncycloPetey 16:48, 6 July 2007 (UTC)
Just Chile? I read the word in a Vargas Llosa book, which, of course, is Peruvian. Widsith 14:33, 8 July 2007 (UTC)
Yes, I'm sure it is used in Chile. But surely it seems to be used in other countries as well? Widsith 08:55, 9 July 2007 (UTC)
Spanish entries
editThanks for following up on some of my Spanish entries. I've been doing a fair amount of work in this area recently and it's good to have a second pair of eyes. Let me know if you have any specific suggestions. P.S. What do you think of the entries that have "Spanish (Castillian)" as the language header? I've been changing them to just say "Spanish" on any entries that I edit. Mike Dillon 01:40, 28 August 2007 (UTC)
eeks. Do you know if that just happened recently? I made an edit to the template Template:es-conj-ar recently, but that was intended only to link the verb forms. Dmcdevit·t 07:42, 2 September 2007 (UTC)
- Ah, it was just the wrong template then? [1]. Hm, and it was there for a long time. Eventually I'd like to consolidate all these templates into one that's easier to use; there is no real reason to have created a different template for all these forms, when we could just use a parameter. But i haven't found the time yet. Dmcdevit·t 07:45, 2 September 2007 (UTC)
Thanks for that - I shall attempt to change red links to blue. The only false red that I saw on a quick look was nell which is probably nell'. Cheers. SemperBlotto 08:38, 8 October 2007 (UTC)
These pages are very useful. Most of the red links on your larger lists are combined forms of verbs and pronouns - there are very very many of these and we haven't yet decided how to treat them - we have a few (see dimmi as an example). Some seem to be proper nouns in lowercase, and some seem to be real words but with non-standard accents (grave instead of accute etc) - I shall talk to User:Barmar about those (If she reappears). Thanks again. SemperBlotto 11:34, 8 October 2007 (UTC)
Admin?
editHey, I just noticed you adding the {{delete}}
tag on mamcita. It made me wonder, is there any particular reason you aren't an admin yourself? Would you object if I nominated you? Dmcdevit·t 06:32, 17 November 2007 (UTC)
- I think you should decide how much more work you want to put in, if any; it won't be forced on you. The nomination is at Wiktionary:Votes/sy-2007-11/User:Matthias Buchmeier for admin. You have to go there and give your acceptance. Thanks! Dmcdevit·t 05:02, 20 November 2007 (UTC)
New buttons
editWelcome to sysophood (and Merry Christmas). SemperBlotto 16:36, 24 December 2007 (UTC)
Italian frequency lists
editHi there! Do you have any idea why I can no longer open your very useful frequency lists (apart from User:Matthias_Buchmeier/Italian_frequency_list-40000- & User:Matthias_Buchmeier/Italian_frequency_list-50000-)? --Barmar 12:45, 23 June 2008 (UTC)
- I still can't see them, I just get empty pages. I've also tried changing browser without any result. Thanks anyway. --Barmar 06:12, 24 June 2008 (UTC)
- I can only view them using Firefox. Blank pages with Internet Explorer. SemperBlotto 10:32, 11 July 2008 (UTC)
- Thank you Matthias for the new lists, now I'm able to see them. They're so useful! (I'll turn those apocopic forms blue) Ciao --Barmar 12:54, 13 August 2008 (UTC)
Good job with {{es-note-noun-f-starting-with-stressed-a}}
. I attempted to show that an adjective between the article and the noun reverts the article to (deprecated template usage) la, but unfortunately, (deprecated template usage) gran doesn't work with all of the entries that use the template, e.g. *la gran agua. So, I removed the automated counterexample. Anyway, well done. Rod (A. Smith) 16:24, 4 July 2008 (UTC)
sandbox
editSomething strange happened when you created your sandbox. I have moved it to User:Matthias Buchmeier/sandbox SemperBlotto 10:30, 11 July 2008 (UTC)
Bot
editHi, I saw your WT:VOTE; are you using the same code as SemperBlottoBot, FitBot and the rest, or are you using your own code - in which case WT:BOT asks that you publish the code somewhere. User:BuchmeierBot/code would be great. Conrad.Irwin 20:10, 21 September 2008 (UTC)
- No, I use a combination of bash and gawk scrips to analyze the page and write the wiki-code and mvs to upload the pages. The code is published under User:BuchmeierBot/code. Matthias Buchmeier 08:15, 22 September 2008 (UTC)
I'd just like to point out that you're doing a great job with Spanish verb forms through BuchmeierBot. I remember back when the templates were really disorganized and I tried to fix some stuff, but I really couldn't do much without a bot. Needless to say, I come back and you've got a bot that does it perfectly. Good job! Ian Burnet 07:10, 7 January 2009 (UTC)
es-verb(2)
editThere is discussion at Wiktionary talk:About Spanish. I think I can made the coding of the template look more elegant, but there is a proposed functionality set up at {{es-verb2}}
that would combine thre three templates currently in use. (yes, actually combine them, not simply call them) --EncycloPetey 15:25, 6 October 2008 (UTC)
I've started a VOTE on implementing this template in place of the existing ones. --EncycloPetey 00:23, 7 October 2008 (UTC)
You're right- I usually check the conjugation patterns from a variety of sources (verbix is pretty reliable)- but for some reason I didn't this time. It should be conjugated like volver- sorry for the confusion. Nadando 00:17, 7 October 2008 (UTC) Should be fixed now. Nadando 00:18, 7 October 2008 (UTC)
Hi. I was looking for a good translation of the above. As you seem to be something of an expert in the Anglo-Spanish field, I was wondering if you might know? Cheers. -- ALGRIF talk 14:43, 29 October 2008 (UTC)
- The translation is punta del iceberg. Matthias Buchmeier 11:19, 30 October 2008 (UTC)
- They use that because it is the dubbed version on the US films. But it is not the correct way. There is a particularly Spanish way of saying this, and I can't find it. It's driving me nuts :-P -- ALGRIF talk 11:31, 30 October 2008 (UTC)
- la punta del iceberg is certainly a well known saying among Spanish speakers, see e.g. iceberg at www.rae.es or do a google search (gives around 600,000 hits which is quite a lot). Matthias Buchmeier 14:08, 30 October 2008 (UTC)
- Thanks for your help. I know it is in common use, but I felt sure there was a Spanish expression that meant the same. I'm probably getting confused. (It happens quite a lot. ;-/ ) Cheers -- ALGRIF talk 14:29, 30 October 2008 (UTC)
- Similarly in Italian, it is punta dell'iceberg Venere 14:55, 30 October 2008 (UTC)
- la punta del iceberg is certainly a well known saying among Spanish speakers, see e.g. iceberg at www.rae.es or do a google search (gives around 600,000 hits which is quite a lot). Matthias Buchmeier 14:08, 30 October 2008 (UTC)
- They use that because it is the dubbed version on the US films. But it is not the correct way. There is a particularly Spanish way of saying this, and I can't find it. It's driving me nuts :-P -- ALGRIF talk 11:31, 30 October 2008 (UTC)
I found an error in the first-person plural fututre tense. All verbs by bot from this template will be missing this form (e.g. see gruñir). Fortunately, no incorrect entries were created as a result of this error, but there is red link now in all the tables. --EncycloPetey 20:59, 8 November 2008 (UTC)
Hi, you created the compañia entry, but I found it at more sources with the stressed "i" (compañía). But since you're an Spanish expert and I'm just a beginner with that language, I wanted to ask, before I edit. Hi.ro 19:46, 13 November 2008 (UTC)
- Of course its compañía. compañia is a missspelling and was my error. Matthias Buchmeier 10:22, 14 November 2008 (UTC)
This irregular verb has the wrong conjugation table, so many (all?) of the conjugated entry forms are incorrect. --EncycloPetey 05:49, 26 November 2008 (UTC)
This verb also had an incorrect conjugation table. The forms with "qu" were not created by bot, but were created with "c" instead. --EncycloPetey 00:50, 27 November 2008 (UTC)
Also: transparentar. This verb was a typo when it was originally entered (as tranparentar). I have moved it to the correct location, but the conjugation table and all the inflected forms were created using the typo. --EncycloPetey 18:47, 27 November 2008 (UTC)
Also: sojuzgar. The conjugation table did not reflect the g -> gu in some forms. --EncycloPetey 01:38, 1 December 2008 (UTC)
Acceleration
editI've just updated the script to do [2] for you. So if you clear your cache (ctrl+shift+F5) it will now do that. Conrad.Irwin 11:02, 18 December 2008 (UTC)
- Thank you. It now works for Spanish nouns as expected for me.Matthias Buchmeier 11:20, 18 December 2008 (UTC)
Implications of new kludge
editI'm not sure if you've seen Robert's new trick concerning categorization, but would you please read Wiktionary:Grease pit#Template:count page: Building a Better Kludge. As the owner of a bot which creates many form-of entries, your views on new formatting for these entries would be appreciated. -Atelaes λάλει ἐμοί 07:18, 24 December 2008 (UTC)
This is another verb that had the wrong conjugation template when the forms were generated. note: I'm working backwards alphabetically through all the Spanish verbs in Category:Spanish verbs and am about halfway through. So far, there have been very few of these kinds of errors, which is a good sign. --EncycloPetey 08:38, 7 January 2009 (UTC)
Also gargarizar, fertilizar, and edificar had the wrong conjugation. --EncycloPetey 20:34, 7 January 2009 (UTC)
The verb digerir seems to have had the wrong conjugation table. I think I've switched to the correct one, but you might want to check. --EncycloPetey 00:25, 25 January 2009 (UTC)
Another one that had the wrong table. Note: At this point, I've gone through all the verbs except for those starting with "a-", so there shouldn't be many errors left. --EncycloPetey 00:27, 15 February 2009 (UTC)
Can you reply to my query, please? That is if you are indeed the copyright holder. Mglovesfun 11:44, 21 May 2009 (UTC)
- Yes, am am the copyright holder. The lists are released under both the GDFL and the LGPL licenses. Of course, You can import them to the French Wiktionary. Matthias Buchmeier 08:36, 22 May 2009 (UTC)
huh?
editimpaciente — [ R·I·C ] opiaterein — 13:32, 3 June 2009 (UTC)
While the general rule for gender-paired nouns in Romance languages is that the male is used when the gender is unknown, the words for goat descending from the Latin capra are an exception, with the feminine form used for goats of indeterminate gender. — Carolina wren discussió 17:03, 31 August 2009 (UTC)
- Of course, well done! I must have been distracted.Matthias Buchmeier 07:42, 1 September 2009 (UTC)
Your bots' feed me messages
editHi there. Can I use your bots' feed me messages on both the feedme page and his userpage for my bot please? I'll, of course, give you credit where credit is due :) Please let me know ASAP. Thanks, Razorflame 23:17, 12 November 2009 (UTC)
Wouldn't the plural of tobogán be tobogánes? Razorflame 10:41, 7 December 2009 (UTC)
I do not want to come across as contumelious but please consider casting your vote for the tile logo as—besides using English—the book logo has a clear directionality of horizontal left-to-right, starkly contrasting with Arabic and Chinese, two of the six official UN languages. As such, the tile logo is the only translingual choice left and it was also elected in m:Wiktionary/logo/archive-vote-4. Warmest Regards, :)--thecurran Speak your mind my past 03:21, 2 January 2010 (UTC)
Hi there Matthias. Your bot missed this form of entries for the entry vacilar. Just thought that I'd let you know :) Cheers, Razorflame 20:49, 6 January 2010 (UTC)
For the Latin section, why have you listed descendants on the infinitive page and not the lemma, which is the current standard, at fingō? Also, on fingō, surely ficción comes from fictiō and not fingō? In this case it should be reserved for fictiō and not fingō. Thanks in advance. Caladon 17:35, 8 January 2010 (UTC)
- In response to your question on my talk page, the etymology should link to the lemma whenever possible as it says on WT:ALA. If you want to show that a word comes from the infinitive, you should write the etymology as shown in the example on that page; so if it comes from an inflected form, show both in the etymology, but don't use hidden links (the alternate text parameter, linking to the lemma but not displaying as such), since they're undesirable as well. Caladon 18:42, 10 January 2010 (UTC)
I have hidden you last editions in cascabel because of this reason. If the word is/was used as such other than in "serpiente de cascabel" please add which country, area or time that use occurs/occurred. I have changed the potential order of definitions because the most common meaning of the word clearly is "jingle bell". Regards. --81.39.216.10 10:35, 10 January 2010 (UTC)
Caló
editSIL just retired the language code for "Caló" [rmr]. They split it into "Erromintxela" ("Basque Romany") [emx] and made a new "Caló" [rmq] identifier. Would you be able to update your edits involving Caló (chavall, molar, currar, chalar, and camelar) to one of these new identifiers (probably [rmq])? Thanks. --Bequw → ¢ • τ 22:07, 21 January 2010 (UTC)
login
edithi, your bot isn't logged in --Rising Sun talk? contributions 11:42, 20 April 2010 (UTC)
- I seems that WWW-Mediawiki-Client is not able to login anymore. I get an error message saying: Login error. There seems to be a problem with your login session; this action has been canceled as a precaution against session hijacking. Please hit "back" and reload the page you came from, then try again. However a manual login to my bot-account from Firefox works OK. Does anyone know if the Mediawiki software has been recently updated? Matthias Buchmeier 14:34, 20 April 2010 (UTC)
- I had the same error message. Others apparently have had to download the whole bot package again. I tried that, but it didn't work, so I got disgruntled and gave up, hoping that some other bugger will take over the bot from me. --Rising Sun talk? contributions 17:05, 20 April 2010 (UTC)
Spanish suffixes
editHello. When editing Spanish suffixes, maybe you should specify the parameter 2= of {{es-suffix}}
, to ignore the hyphen and sort them alphabetically in Category:Spanish suffixes. --Daniel. 07:17, 6 May 2010 (UTC)
Hi there Matthias. Can you have your bot add the verb form to this because the original verb form added to the entry isn't current at this point in time. Thanks, Razorflame 03:54, 11 May 2010 (UTC)
- Never mind. I fixed the page for you :) Razorflame 04:38, 11 May 2010 (UTC)
I urge you to vote. (I don't know which way you'll vote, but I want more voices, especially English Wiktionarians' voices, heard in this vote.) If you've voted already, or stated that you won't, and I missed it, I apologize.—msh210℠ 17:01, 21 May 2010 (UTC)
Spanish word list
editHi Matthias. Is it possible to get the original word list from which the Spanish word-frequency list was compiled? Comber 13:20, 28 August 2010 (UTC)
- Sure. The compressed file is about 70Mb. Send me an email and I'll try to send it as attachment.Matthias Buchmeier 08:05, 31 August 2010 (UTC)
- I don't know how to get your email address or send you an email. I presume you don't want your email on pages open to web-crawlers. I can give you an FTP that you can upload to. Comber 16:59, 1 September 2010 (UTC)
- Just follow the link 'E-mail this user' in the 'Toolbox' on the left side of my user page. Matthias Buchmeier 08:49, 2 September 2010 (UTC)
Macho and macha
editDear Matthias. Sorry to contradict you. "Macha" is not the feminin of "macho" as in "female" and "male", respectively. As I mentioned before, the corresponding feminine word to "macho" (in Spanish) is "marimacho". "Macha" is a kind of mollusc and "salsa macha" is a very hot and spicy sauce used to dress mostly seafood (the same as "salsa bruja"). As for the verb "machar" it has nothing to do with the word "macho" and means to "hit" or "break". In said cases, I suggest to create an article for each of both words.--Estaurofila 14:24, 14 September 2010 (UTC)
- He's blocked for a day, so I'll butt in. Machar does have its own entry. We allow inflected forms, it doesn't matter whether the etymologies are the same - you can add more than one. Mglovesfun (talk) 14:45, 14 September 2010 (UTC)
Isn't it better to keep it as a deprecated template? It actually was used three times, which would have been one as one was on a talk page, the other two would have been removed by AutoFormat, except it's down right now. Mglovesfun (talk) 16:56, 29 October 2010 (UTC)
Bisonte, not "bisón"
editHi Matthias. At least in User:Matthias_Buchmeier/en-es-e and User:Matthias_Buchmeier/en-es-w you have listed "bisón europeo". The form "bisón" does not exist in Spanish. It only exists in (few) pages in internet due to a bad translation from English (bisón instead bisonte) and a confusion with "visón" (mink) which happens to be pronounced the same in Spanish. The proper form is "bisonte (europeo, americano...)". Regards. --82.198.250.2 13:56, 1 November 2010 (UTC)
- It seems that the Spanish translations at wisent, and European bison have already been corrected. The English-Spanish dictionary files will be update with the next database dump in approximately 2 weeks.Matthias Buchmeier 12:10, 2 November 2010 (UTC)
- I did not know it was automatically updated. Thanks. --82.198.250.2 11:23, 6 November 2010 (UTC)
Using the dictionaries in a closed source software
editHi Matthias. Nice work compiling those bilingual dictionaries from wiktionary dump. I would like to create a small mobile phone dictionary software and I wonder if it is ok to use your german dictionary, provided I add a link to your page or to wiktionary web page. Does the licences allow this kind of usage ? Regards. Charlie137 10:41, 11 January 2011 (UTC)
- Yes the licence does allow re-use, for details have a look at [3]. Matthias Buchmeier 12:19, 12 January 2011 (UTC)
Hi Matthias. We've not really crossed paths before. I like to add some Spanish entries form time to time, (I live in Spain, btw). I have followed your entries a bit, and I have seen you are a good, serious, consistent contributor. I don't know where you live, but if you live in or visit Spain, you will have heard the word "mola" or the phrase "mola (un) montón" - meaning approximately, "that's great" or "that rocks" It has been around quite long enough to merit a Wikt entry. But here's my problem. It seems to be a highly defective verb form, i.e. it seems to be a verb with only one form - mola. I really have no idea how to make a decent entry. Your input would be very much appreciated. Cheers. -- ALGRIF talk 13:01, 11 February 2011 (UTC)
- I currently live in Germany, but I travel to Spain frequently and my wife is a native Spanish speaker. You could add it as usage example or in a section Usage Notes at the pages molar (I already did that) or mola, or add i.e. mola un montón as a separate entry. The latter has the disadvantage, that some user might potentially think your entry is SOP and delete it. Matthias Buchmeier 13:23, 11 February 2011 (UTC)
- Thanks. The entry looks just fine. -- ALGRIF talk 14:17, 11 February 2011 (UTC)
This is 'ready', see this edit in my sandbox which shows no bugs. Mglovesfun (talk) 11:28, 2 March 2011 (UTC)
- It seems OK to me now. However the syntax of
{{es-noun}}
is not compatible with{{es-noun-mf}}
, which accepts mpl parameter for masculine and fpl parameter for feminine forms respectively, while for{{es-noun}}
the plural has to be specified as second unnamed parameter. So you should be very careful with bot-replacements, i.e. your replacement on page vendedor still doesn't work correctly with{{es-noun/new}}
(wrong plural). This kind of parameter incompatibility might IMHO also cause errors when unwary users, familiar with{{es-noun-mf}}
switch to the new template.Matthias Buchmeier 12:08, 2 March 2011 (UTC)- Not quite right; it doesn't have to be a second unnamed parameter as it accepts {{{p}}} or {{{pl}}}. It wouldn't accept mpl for a masculine noun, though, only the three previously mentioned. Having said that, I think it could be made to work that way, albeit in such a silly way it would be saner just to correct the entries. Mglovesfun (talk) 12:14, 2 March 2011 (UTC)
- I just figured out, that
{{es-noun}}
already handles the dual gender case correctly. For dual gender noun mf has to be specified rather than m or f as gender parameter, then the template will determine the gender of PAGENAME on its own. See me latest edits on vendedor and vendedora.Matthias Buchmeier 12:38, 2 March 2011 (UTC)- I don't think anyone (certainly not me) thinks that all the es-noun-* templates should be orphaned as soon as they fail RFD (if they do, of course). I suspect such an orphaning wouldn't be done at least until the summer, for reasons you've gone in to above and on User talk:MglovesfunBot. Mglovesfun (talk) 13:53, 2 March 2011 (UTC)
Wouldn't this be lowercase? Mglovesfun (talk) 16:53, 16 March 2011 (UTC)
- No, after DRAE hacienda in the meaning of THE governmental department dealing with finances and taxes is uppercase.Matthias Buchmeier 12:17, 17 March 2011 (UTC)
Translations from an Xml dump
editIn reference to a prior conversation, Conrad's script is already written in Python, but it has a lot of stuff in there for creating indices, and he's been busy outside Wiktionary for weeks. What I'm looking for specifically is something that:
- Parses out tranlations, by language
- Could be a standard Wiktionary module in PyWikipedia
Do you want to do that? ~ heyzeuss 09:02, 23 March 2011 (UTC)
- I'm sorry but my Python knowledge is very poor and I currently don't have the time to learn it. Anyhow my code does quite simple regexing, so someone knowing Python well could IMHO easily translate it to Python. On the other hand I could help you to modify my code to fit whatever output format you need if AWK is OK for you.Matthias Buchmeier 10:07, 23 March 2011 (UTC)
- Hmm, gotta figure out how to run AWK on Windows, because I'm just another assimilated drone. :) I've just barely stuck my toe in the water with Python. I would like a list with English words that have Finnish translations, their translations in Finnish, and senses. Tab separated is usually easiest to deal with. I'll take a look at getting AWK onto my machine here, first. ~ heyzeuss 11:51, 23 March 2011 (UTC)
- You actually need Gnu-Awk AKA Gawk, which has some extended regex functionality that I use. There are Windows versions available eg. on gnuwin32.sourceforge.net/packages/gawk.htm or you can install Cygwin (www.cygwin.com) which brings preconfigured versions of most unix programs, however at the cost of occupying a bit more disk-space. Matthias Buchmeier 13:14, 23 March 2011 (UTC)
- Installed Gawk, ran without any modifications: "bzcat enwiktionary-20110331-pages-articles.xml.bz2|gawk -v LANG=Finnish -v ISO=fi -f trans-en-fi.awk|sort >en-fi.wiki". However, the output was an empty file called "en.fi.wiki." Can you give any pointers? ~ heyzeuss 14:00, 23 March 2011 (UTC)
- This command-line requires, of course, to have bzcat and sort (both standart programs on unix) installed and to be in the path. Alternatively you can unpack the dump with whatever unpacker you have at hand and then run: gawk -v LANG=Finnish -v ISO=fi -f trans-en-fi.awk enwiktionary-20110331-pages-articles.xml >OUTPUT.TXT which should give you then same result but unsorted. Of course gawk, trans-en-fi.awk, and enwiktionary-20110331-pages-articles.xml must be in the path, or otherwise the respective full paths given on the command-line. I hope that helps you.Matthias Buchmeier 14:54, 23 March 2011 (UTC)
- Yeah, that helps. It works with a bare xml file. Sort is also the name of a Windows command that does the same thing. Thank you for your help. :) ~ heyzeuss 17:51, 23 March 2011 (UTC)
- This command-line requires, of course, to have bzcat and sort (both standart programs on unix) installed and to be in the path. Alternatively you can unpack the dump with whatever unpacker you have at hand and then run: gawk -v LANG=Finnish -v ISO=fi -f trans-en-fi.awk enwiktionary-20110331-pages-articles.xml >OUTPUT.TXT which should give you then same result but unsorted. Of course gawk, trans-en-fi.awk, and enwiktionary-20110331-pages-articles.xml must be in the path, or otherwise the respective full paths given on the command-line. I hope that helps you.Matthias Buchmeier 14:54, 23 March 2011 (UTC)
- Installed Gawk, ran without any modifications: "bzcat enwiktionary-20110331-pages-articles.xml.bz2|gawk -v LANG=Finnish -v ISO=fi -f trans-en-fi.awk|sort >en-fi.wiki". However, the output was an empty file called "en.fi.wiki." Can you give any pointers? ~ heyzeuss 14:00, 23 March 2011 (UTC)
- You actually need Gnu-Awk AKA Gawk, which has some extended regex functionality that I use. There are Windows versions available eg. on gnuwin32.sourceforge.net/packages/gawk.htm or you can install Cygwin (www.cygwin.com) which brings preconfigured versions of most unix programs, however at the cost of occupying a bit more disk-space. Matthias Buchmeier 13:14, 23 March 2011 (UTC)
- Hmm, gotta figure out how to run AWK on Windows, because I'm just another assimilated drone. :) I've just barely stuck my toe in the water with Python. I would like a list with English words that have Finnish translations, their translations in Finnish, and senses. Tab separated is usually easiest to deal with. I'll take a look at getting AWK onto my machine here, first. ~ heyzeuss 11:51, 23 March 2011 (UTC)
Hey Matthias, thank you for continuing to maintain this. I'm still benefiting from it, two years later. It is nice to get updates on the Finnish dictionary from time to time, especially since it is constantly growing. I've migrated from a Blackberry to an Android phone which has some dictd-ready apps. This is good because I no longer have to convert your dict files into eBooks for use on my Blackberry, and it shows that your efforts are getting attention from app developers. It is hard to find good dictionaries for strange languages such as Finnish, so again, thank you. :) ~ heyzeuss 17:51, 7 May 2013 (UTC)
zwar
editMoin. Du hast vor einiger Zeit einen Eintrag zu zwar erstellt und dort als Übersetzung certainly und indeed angegeben. Ich habe dies kürzlich gelöscht, da mir diese Bedeutung in meinem gesamten mit Deutsch-Leistungskurs gespickten Leben noch nicht untergekommen ist. Duden gibt eine Mittelhochdeutsche Form mit Übersetzung für "fürwahr" (indeed) an, aber Mittel- und Hochdeutsch werden hier als verschiedene Sprachen behandelt. Da ich Dich ob Deines Nachnamens spontan als Süddeutschen einstufe, bestünde vielleicht die Möglichkeit, diese Bedeutung als dialectal nachzutragen. Woher hast Du diese Übersetzung und möchtest Du sie belegen und wiedereinfügen? Andernfalls würde ich sie weiterhin gelöscht lassen, da sie nach Duden maximal als archaic, und selbst das nur mit Bauchschmerzen, gelten könnte. Freundlicher Gruß, Dakhart 01:09, 20 July 2011 (UTC)
Need a favor
editIf you have a minute of free time, do you think you could translate the book quotation I added to Pornografie? It's rather more grammatically involved than I'm comfortable attempting... — [Ric Laurent] — 16:41, 16 September 2011 (UTC)
- That quote sounds horribly screwed up, in a way typical for humanistic sciences, to me. Anyhow I've tried to do my best to translate it. Matthias Buchmeier 10:07, 19 September 2011 (UTC)
- Yeah, it did look kinda odd... I figured it was just too advanced for me, but I'm glad to know it wasn't all me :) At least it wasn't a philosopher, though, I find nobody rambles on more about absolutely nothing than they do. — [Ric Laurent] — 10:50, 19 September 2011 (UTC)
You might want to check out what your bot's done to Portuguese past participles — [Ric Laurent] — 15:45, 4 October 2011 (UTC)
feed Portuguese bot
editHi MB, I heard you've moved into the sleazy world of bot-creating Portuguese verb forms. I've got a few I'd like you to add entries for. Is there a feed page for Portuguese verbs? Maybe on User:BuchmeierBot/FeedMe you could add a section for Portuguese, which would be most lovely. --Rockpilot 14:34, 19 October 2011 (UTC)
- I have added a Portuguese section to User:BuchmeierBot/FeedMe.Matthias Buchmeier 09:17, 20 October 2011 (UTC)
BuchmeierBot
editI found a Portuguese verb which had an incorrect conjugation table, distribuir. BuchmeierBot generated some incorrect entries from that table. What should I do about it? RFD? Ungoliant MMDCCLXIV 17:33, 10 February 2012 (UTC)
- I have removed the forms, which are incorrect according to the Portuguese wiktionary. Is their conjugation table correct?Matthias Buchmeier 10:36, 13 February 2012 (UTC)
- Thanks. There is one mistake in their table, the 2nd person singular infinitive is distribuires but should be distribuíres. You deleted it anyway, so it's all fine now. By the way, I've created the conjugation template needed for distribuir now. Ungoliant MMDCCLXIV 13:02, 13 February 2012 (UTC)
Russian DB dump
editHi,
I like your translation DB dump. Could you repeat it for Russian, please? Do you accept requests for other languages? In particular, I'm interested in similar dumps for Mandarin, Japanese and Arabic. Your frequency lists look impressive too, perhaps you could share them on Wiktionary:Frequency lists. --Anatoli (обсудить) 22:48, 20 February 2012 (UTC)
- Do you mean an update of "English-Russian" from yesterday's dump? I do this regularly shortly after the dumps are updated (approximately 3 times a month); the update is on the way. Adding more languages is easily possible. For Arabic I'm not sure whether it makes more sense to include all dialects such as Egyptian, Morocan etc. or generate a dictionary for classical Arabic. Any hints? Matthias Buchmeier 11:04, 21 February 2012 (UTC)
- Thanks for the reply. Yes, that's the one. So you do it quite regualrly! Awesome! There's also Index:Russian, which includes both the entries and translations but it's getting out of date rather quickly and it doesn't have English definitions, User:Conrad.Irwin must be busy with other stuff. Your DB dump looks more like a small dictionary in itself. Very useful for me, as I can see, which transliterations I need to fix. For Arabic I would include only Arabic (MSA), the dialect translations are of worse quality and the majority of standard Arabic words are also dialectal, with difference in pronunciation and frequency. Very eager to see English-Mandarin, English-Japanese and English-Arabic. If you say it's easy I'd like to ask you for English-Persian, English-Hindi and English-Vietnamese and English-Thai as well - languages I rather regularly make translations for (you already do English-French and English-German). But the other 3 are of a higher priority for me. Please consider adding your dictionaries to the main space categorised by languages they are for with some regular updates. Good luck and thanks for your efforts! --Anatoli (обсудить) 22:33, 21 February 2012 (UTC)
- Also, would be fantastic if you could do the FL-English dictionaries as well, like your "From Non-English Language Entries" section. :) --Anatoli (обсудить) 22:35, 21 February 2012 (UTC)
- Of course the intention to create these files was to have small offline-dictionaries. They can be used where you don't have internet access (which in spite of smartphones happens quite often) and moreover the search fuctionality of dictionary software as e.g. DING is much smarter and faster than an online search on en.wiktionary. I have just uploaded the English-Japanese dictionary. It took me longer than expected to generate the dictionary as some code modification was necesary. The point is that Japanese translations use to have nested template calls (mostly
{{l}}
but also{{t}}
) inside the tr-parameter of{{t}}
-templates. Do you know if this type of nesting is supposed to be encouraged? Maybe it would be helpful to add some Japanese Translations section describing the recommended formating to WT:About Japanese. Currently such a section is also missing in WT:About Arabic. The English-Mandarin and English_Arabic dictionaries also seem to takes me some more time (days), because Chinese Translations use to contain some special templates as{{zh-ts}}
, while Arabic contains a zoo of dialects. Is English-MSA supposed to only include those translation lines beginning with * Arabic: and *: MSA:? What are the MSA ISO-codes? Are they only ar and arb? Maybe it would be helpful to add all Arabic dialects and corresponding iso-codes for reference as a table on WT:About Arabic. The generation of FL-English dictionaries is only easy when the formating of the FL-sections is more or less consistent. Unfortunately for most FLs quite the contrary is the case; the formating use to be very heterogeneous and inconsistent, as a consequence of historic change in formating policy and template use. Therefore it might require some cleanup work on the FL-sections before being able to generate a FL-English dictionary. Matthias Buchmeier 16:23, 22 February 2012 (UTC)
- Of course the intention to create these files was to have small offline-dictionaries. They can be used where you don't have internet access (which in spite of smartphones happens quite often) and moreover the search fuctionality of dictionary software as e.g. DING is much smarter and faster than an online search on en.wiktionary. I have just uploaded the English-Japanese dictionary. It took me longer than expected to generate the dictionary as some code modification was necesary. The point is that Japanese translations use to have nested template calls (mostly
- Thanks for the reply. No, the non-standard templates are discouraged. Some bots generated additional templates inside
{{t}}
but I have been correcting this. Also,{{zh-ts}}
and{{zh-tsp}}
still appear. We have been converting this (me and Tooironic). Agree with the need to standardise translations and adding this into About pages. I'm a bit confused about your Arabic questions. Mixing MSA (simply "ar" is used not aware of "arb") with dialects would be messy. I don't think it's a very good idea (I'd like to have a list of translations where there is a translation into a dialect but not into MSA). Thanks for generating ja-en dictionary! --Anatoli (обсудить) 22:59, 22 February 2012 (UTC)
- Thanks for the reply. No, the non-standard templates are discouraged. Some bots generated additional templates inside
- arb is the ISO-code for MSA - but probably three-letter ISO-codes (also arz etc.) should not be used in Arabic t-templates as this will break the linking to ar.wiktionary. The use of
{{zh-ts}}
in translations-sections is encouraged by WT:About Chinese, which states "This template may be used in places where both the simplifed and traditional versions should be placed side by side (ex. translations section for English entries).". If you don't want these templates to be used, they really should be explicitly discouraged in the section "Translations into Chinese languages/dialects/topolects". Matthias Buchmeier 10:32, 23 February 2012 (UTC)
- arb is the ISO-code for MSA - but probably three-letter ISO-codes (also arz etc.) should not be used in Arabic t-templates as this will break the linking to ar.wiktionary. The use of
- I have removed the misleading and the contradicting section, thans for the extracts! --Anatoli (обсудить) 01:46, 24 February 2012 (UTC)
- For now I have included the variants
- "Arabic|MSA|Standard Arabic"
- and
- "Mandarin|Central Mandarin|Jianghuai Mandarin|Northern Mandarin|West Mandarin"
- and excluded
- "Algerian| Andalusian|Bahrani|Chadian|Egyptian|Egyptian Arabic|Gulf|Gulf Arabic|Hassānīya|Iraqi|Iraqi Arabic|Lebanese|Lebanese/Syrian|Levantine|Levantine Arabic|Libyan|Moroccan|Moroccan Arabic|Morocco|North Levantine Arabic|Palestinian|Palestinian Arabic|South Levantine Arabic|Syrian|Sudanese|Tunisian Arabic|UAE"
- and
- "Amoy|Bai|Cantonese|Changsha|Chaozhou|Chengdu|Dungan|Eastern Hokkien|Eastern Min|Fuzhou|Gan|Guangzhou|Haikou|Hainanese|Hakka|Hangzhou|Harbin|Hokkien|Hui|Jian[']ou|Jin|Jixi|Liuzhou|Meixian|Min Bei|Min Dong|Min-nan|Min nan|Min Nan|Min-Nan|Nanchang|Nanning|Northern Hokkien|Northern Min|Northern Wu|Old Chinese|Pinghua|Shanghai|Shanghainese|Sichuanese|Southern Min|Southern Wu|Suzhou|Taiyuan|Taiwan|Taiwanese|Teochew|Tuhua Dong[']an|Ürümqi|Wenzhou|Wu|Wuhan|Xiang|Xiamen|Xi[']an|Xuzhou|Yangzhou|Yue"
- for the MSA and Mandarin dictionaries resp.. This was only my very vague guess, and I don't know much Arabic or Mandarin. I would appreciate your suggestions on how to procede in the future. Matthias Buchmeier (talk) 17:51, 24 February 2012 (UTC)
- For now I have included the variants
- it all seems very complicated, thanks for your efforts. Perhaps if you continue doing it the same way, it'll be allright. I now have teh chance to check the translations easier. I noticed a strang thing, though. The translation of hyper- doesn't have a Chinese translation at all, only Japanese (which is actually the same in this case) - 超. Do you know what happened? --Anatoli (обсудить) 22:32, 26 February 2012 (UTC)
- It's not so complicated. One only has to know or guess which variants are considered part of MSA and Mandarin resp., and which are are dialectal. 'hyphen-' is actually also there in the Chinese list. The problem is the sorting, which seems to ignore the hyphen and other special characters and therefore 'hyphen-' its placed between 'hypernym' and 'hypertension'. I've fixed this problem and will update it with the next extracts. Matthias Buchmeier (talk) 10:29, 27 February 2012 (UTC)
- it all seems very complicated, thanks for your efforts. Perhaps if you continue doing it the same way, it'll be allright. I now have teh chance to check the translations easier. I noticed a strang thing, though. The translation of hyper- doesn't have a Chinese translation at all, only Japanese (which is actually the same in this case) - 超. Do you know what happened? --Anatoli (обсудить) 22:32, 26 February 2012 (UTC)
Note I've added the figurative sense. The literal sense should probably be deleted, or at least soft-redirected. What you reckon? ---> Tooironic 12:22, 23 February 2012 (UTC)
- Isn't it also an insult, "that comment was a real slap in the face". Mglovesfun (talk) 12:24, 23 February 2012 (UTC)
- So where would you suggest to put the translations for the literal SOP meaning?Matthias Buchmeier (talk) 17:52, 24 February 2012 (UTC)
corrections
editThanks for checking over my contributions, and correcting any formating issues. I hope to learn it all quickly --Cova (talk) 17:32, 2 March 2012 (UTC)
Bilingual dictionary for Tamazight
editHallo Matthias. Would it be possible to create such list for Central Atlas Tamazight? It would be helpful. I think it can fit on one page because there are only about 100-200 translations. Maro 18:52, 19 March 2012 (UTC)
- No problem, here you are: en-tzm (a-z). Matthias Buchmeier (talk) 12:07, 20 March 2012 (UTC)
- Vielen Dank :). Maro 19:06, 20 March 2012 (UTC)
A Question
edit- Hallo Matthias,
- Könnte dein Bot auch ein kurdisch-englisches Dictionary erstellen, wenn es dadurch nicht überfordert wird?Danke im Voraus.George Animal (talk) 16:48, 17 April 2012 (UTC)
- Englisch-kurdisch (aus engl. Übersetzungen) ist kein Problem. Für kurdisch-english (aus kurdischen Einträgen) bräuchte ich (zumindest für die Version mit arabischer Schrift einen Sortieralgorithmus). Matthias Buchmeier (talk) 16:58, 17 April 2012 (UTC)
- Englisch-Kurdisch würde ausreichen, wenn das klappte.Danke schön.Sehr lieb von dir.Liebe GrüßeGeorge Animal (talk) 16:59, 17 April 2012 (UTC)
- Willst Du Kurmancî und Soranî lieber in zusammen in einem Wörterbuch oder getrennt?Matthias Buchmeier (talk) 17:10, 17 April 2012 (UTC)
- Ich will dir keine Mühe bereiten, getrennt würde es besser sein, sonst würde man durcheinanderkommen als es zu verstehen.Du bist großartig.George Animal (talk) 17:13, 17 April 2012 (UTC)
- Es scheint im Moment keinen Sinn zu machen Kurmancî und Soranî zu trennen, da sie in vielen Einträgen gemischt vorkommen. Ich werfe dann das Skript an, das Wörterbuch sollte in ca 20 min fertig sein. Matthias Buchmeier (talk) 17:30, 17 April 2012 (UTC)
- Du hast Recht.Ok so kannst du es machen.George Animal (talk) 17:32, 17 April 2012 (UTC)
- Ich weiß echt nicht, wie ich mich bei dir bedanken soll.Liebe GrüßeGeorge Animal (talk) 17:34, 17 April 2012 (UTC)
- Keine Ursache. PS: Falls Du Zeit und Lust hast die kurdischen Übersetzungen aufzuräumen, dann kann ich dir eine Liste mit fehlendem "Kurmancî/Soranî"-Nesting generieren. Sobald sie in den Übersetzungen sauber getrennt sind kann ich auch getrennte Wörterbücher erstellen. Matthias Buchmeier (talk) 17:38, 17 April 2012 (UTC)
- Danke.Das wäre nett.Das werde ich gerne tun.George Animal (talk) 17:41, 17 April 2012 (UTC)
eine kleine Frage
edit- Hi Matthias,
- wie du sehen kannst, versuche ich gerade kurdische Verbformen zu erstellen; aber irgendwie es ist schwer das alles auf einmal zu schaffen, was manuell sehr schwer ist.Ich habe in der Teestube die Frage in die Runde geworfen, aber die Antworten waren wenig unpositiv.Niemand hat sich gertraut sich dieser Herausforderung zu stellen, nämlich die kurd. Verformen per Bot zu erstellen.Mglovesfun meinte, die Bot-Betreiber sollten sich mit der Sprache gut auskennen, was jedoch unmöglich ist, weil keiner von den Botbetreibern Kurdish kann.Ich wollte nun fragen, ob dein Bot das könnte?.Ich weiß, dass du kein Kurdisch sprichst.Wenn er es nicht kann, könntest (musst du nicht) du mir einen Vorschlag machen, wie dieses Berg aus Verbformen überwinden könnte?Liebe Grüße an dich.GeorgeAnimal. 09:56, 1 May 2012 (UTC)
- Hi George,
- die Verbformen per Hand zu erstellen ist tatsächlich keine gute Idee, da extrem zeitaufwändig und fehleranfällig. Ich könnte die Verbformen per Bot erstellen, traue mich aber nicht, da fehlerhafte Einträge in der Regel manuell korrigiert werden müssen, und per Bot schnell viele tausend Einträge entstehen. Kannst Du denn irgendeine Programmiersprache? Wenn Du es schaffen würdest den Wikitext für die konjugierten Formen irgenwie automatisiert zu erstellen, dann ist der Rest ganz einfach.Matthias Buchmeier (talk) 10:15, 2 May 2012 (UTC)
- Hi Matthias,
- danke für die rasche Antwort.Leider kann ich keine einzige Programmiersprache, das ist ja das problem, sonst hätte ich es wie du es oben erläutert hast, erstellt.Deshalb habe ich die Befürchtung, so viele fehlerhafte Formen zu erstellen, was spater einen großen Aufwand bedeuteten.Danke trotz-dem.Dann erstelle ich sie halt manuell.All die Formen von den Verben, die jetzt eingetragen sind, zu erstellen, würde mich ein paar Wochen kosten, da ich auch sehr wenig Zeit habe.Sonst hätte ich sie in einer Woche erstellt.Deinen zweiten Vorschlag mit Wikitext, habe ich nicht so ganz verstanden.Könntest du ihn erläutern für mich.Danke und liebe GrüßeGeorgeAnimal. 13:06, 2 May 2012 (UTC)
Hi George, als Programmiersprache funktionieren auch z.B. Word, Excel, oder die Such- und Ersetzfunktion in deinem Texteditor. Die automatische Erstellung kann z.B. so funktionieren, dass Du pro Verbtyp eine Textdatei erstellst, die alle zu generierenden Einträge (im Wiki-Format) enthält. Damit der Bot die einzelnen Einträge unterscheiden kann steht am Anfang jedes Eintrags:
{{-start-}}
<<<EINTRAG-NAME>>>
und am Ende:
{{-stop-}}
Du kannst pro Konjugationstyp eine solche Textdatei mit allen zu erzeugenden Einträgen erstellen, wobei Du z.B. die Platzhalter im jeweiligen Template wie {{{1}}} usw. benutzen kannst. Dann kannst Du einfach für jedes Verb die Platzhalter {{{1}}}, {{{2}}}, {{{3}}}, u.s.w. mit den jeweiligen Werten ersetzen. Wenn Du soweit bist, dann kann ich die Seiten für dich mit meinem Bot hochladen, oder Du installierst den pywikipedia-Bot selbst auf deinem PC. Matthias Buchmeier (talk) 13:52, 2 May 2012 (UTC)
- Hi,
- kann dein Bot mit existierenden Konjugationsvolargen etwas anfangen?Wie hier [4], hier usw.Liebe GrüßeGeorgeAnimal. 17:43, 2 May 2012 (UTC)
- Einige Verben mit Vorlagen: dîtin, teqandin, kelandin, qelandin,cemidandin.GeorgeAnimal. 17:45, 2 May 2012 (UTC)
- Nicht ohne Weiteres,
- ich brauche wie gesagt eine Datei die z.B. (für ku-conj-tin) folgendermaßen aussieht:
{{-start-}}
<nowiki><<<</nowiki>{{{2|}}} {{{6}}}{{{3}}}{{{1}}}im<nowiki>>>></nowiki>>
==Kurdish==
===Verb===
{{head|ku}}
# {{ku-verb form of|{{{2|}}} {{{4}}}in|1|s|g}}
{{-stop-}}
{{-start-}}
<nowiki><<<</nowiki>{{{2|}}} {{{6}}}{{{3}}}{{{1}}}î<nowiki>>>></nowiki>>
==Kurdish==
===Verb===
{{head|ku}}
# {{ku-verb form of|{{{2|}}} {{{4}}}in|1|p|g}}
# {{ku-verb form of|{{{2|}}} {{{4}}}in|2|p|g}}
# {{ku-verb form of|{{{2|}}} {{{4}}}in|3|p|g}}
{{-stop-}}
u.s.w. für alle zu erzeugende Verb-Form-Einträge
- Dann kann man z.B. mittels
{{subst:User:George Animal/ku-conj-tin-alleFormen||bîn||di|dît|bi|}}
(wenn diese Datei User:George Animal/ku-conj-tin-alleFormen heißt) alle Verb-Form -Einträge für dîtin erzeugen und anschließend per Bot hochladen. Ich hoffe meine Erklärung war verständlich genug. - viele Grüße Matthias Buchmeier (talk) 10:03, 3 May 2012 (UTC)
- Dann kann man z.B. mittels
- Vielen dANK:
PS:Der Bot hat alle Einträge des Verbs gotin für das Verb gotin erstellt.Fehlt es etwas an meiner Vorlage?.Liebe GrüßeGeorgeAnimal. 16:39, 3 May 2012 (UTC)
- vegotin ist ein anderes Verbs , unterscheidet sich vom gotin durch präfix ve-.GeorgeAnimal. 16:40, 3 May 2012 (UTC)
- Sonst stimmen alle vom Bot erstellten Formen von dîtin.Danke , sehr nett von dir.GeorgeAnimal. 16:42, 3 May 2012 (UTC)
- Nächstes Mal werde ich die Verben als Verben mit Suffix kennzeichnen, damit keine falsche Formen entstehen.GeorgeAnimal. 16:46, 3 May 2012 (UTC)
- Das liegt wohl daran, dass im Template User:George Animal/ku-conj-tin-alleFormen anscheinend noch Fehler sind. Ich habe die Forman mit
{{subst:User:George Animal/ku-conj-tin-alleFormen|bêj||di|got|bi|ve}}
erzeugt. Kannst Du bitte die Einträge in Zukunft durch hinzufügen von{{subst:User:George Animal/ku-conj-tin-alleFormen|PARAMETER}}
in User:BuchmeierBot/FeedMe-Kurdish erzeugen und dann überprüfen (im Wikitext mit (edit)) ob alles korrekt ist? Matthias Buchmeier (talk) 16:50, 3 May 2012 (UTC)
- Das liegt wohl daran, dass im Template User:George Animal/ku-conj-tin-alleFormen anscheinend noch Fehler sind. Ich habe die Forman mit
- Ok ich werde es machen, und die Einträge konrollieren, ob die Formen da auch stimmen.GeorgeAnimal. 16:55, 3 May 2012 (UTC)
- PS: es sieht so aus, als ob in User:George Animal/ku-conj-tin-alleFormen jeweils
# {{ku-verb form of|{{{2|}}} {{{6}}}{{{4}}}in|1|s|g}}
- und nicht
# {{ku-verb form of|{{{2|}}} {{{4}}}in|1|s|g}}
- stehen sollte, d.h. Du hast den Parameter Nr. 6 vergessen. Matthias Buchmeier (talk) 16:59, 3 May 2012 (UTC)
- Ja, der beim Infinitiv und einigen anderen Formen.Warum habe ich das übersehen.Enschulddigung.GeorgeAnimal. 17:02, 3 May 2012 (UTC)
- Ist nicht weiter schlimm, korrigiert nur bitte die fehlerhaften Einträge. Matthias Buchmeier (talk) 17:06, 3 May 2012 (UTC)
- Ich habe den Fehler behoben, siehe das verb vegotin.Die Formen von -andin stimmen, habe getestet und werde nun checken, ob es irgendwelche Fehler gibt.Dann schreibe ich sie mit
{{subst:User:George Animal/ku-conj-tin-alleFormen|PARAMETER}}
in User:BuchmeierBot/FeedMe-Kurdish schreiben.GeorgeAnimal. 17:45, 3 May 2012 (UTC)- Es scheint zumindest bei vebibêjim, vegotiye, und vedibêjim noch Fehler zu geben. Kannst Du die Einträge bitte nochmal überprüfen? Matthias Buchmeier (talk) 09:43, 4 May 2012 (UTC)
- Ich hatte sie vergessen, aber jetzt korrigiert.Bei den Formen von Verben mit Endung -andin, scheint alles richtig zu sein.Dein Bot leistet gute Arbeit.Das Kurdische dankt dir für dein Bemühen.Liebe GrüßeGeorgeAnimal. 14:05, 4 May 2012 (UTC)
- Danke super geworden.Ohne deine Hilfe wäre im am Ende.Ich werde gleich eine Seite für die Verben mit Endung -irin machen.Diesmal werde ich noch die Konjunktivformen hinzufügen.Beim -andin habe ich vergesen.Geht es wenn ich sie später nachtrage?.Ist das OK für dein bot?.GeorgeAnimal. 15:29, 4 May 2012 (UTC)
- Das Nachtragen von Fromen ist OK, da nur Einträge ohne kurdisch Abschnitt erzeugt werden. Pass bitte nur sorgfältig auf, dass es keine Fehler gibt. Diese müssen nämlich in der Regel von Hand korrigiert werden, was ab einigen hundert Einträgen (die man per Bot sehr schnell schafft) sehr anstrengend wird. Matthias Buchmeier (talk) 15:37, 4 May 2012 (UTC)
- Dein Bot könnte jetzt weiterhin die Formen von Verben mit Endung --andin erstellen.Ich habe die erstelletn Einträge kontrolliert.Sie stimmen.Mit anderen Formen werde ich mich noch beschäftigen und die Fehler beheben ggf. was hinzufügen.Liebe GrüßeGeorgeAnimal. 15:43, 4 May 2012 (UTC)
- Hi Matthias
- Sollen die verneinte Formen erstellt werden oder sind sie ueberfluessig?Ich habe die verneinte Formen in die Tabellen hinzugefügt.So siehst besser aus.Liebe GrüßeGeorgeAnimal. 15:58, 11 May 2012 (UTC)
- Da sie sich von den nicht-verneinenden Formen unterscheiden, denke ich sie sollten auf jeden Fall erstellt werden. Matthias Buchmeier (talk) 16:02, 11 May 2012 (UTC)
- Hi Matthias
- Ich habe gerade PywikipediBot heruntergeladen.Ich bin der Anleitung auf Wikibooks gefolgt, aber irgendwie kann ich mich nicht anmelden.Ich habe mit Editor die Datei mit userconfig.py erstellt unter der Hauptdtaei pywikipedia.Aber wenn ich auf mcd die Code eingebe, erscheint wieder dieser Editor, die ich bearbeitet habe mit dem Inhalt:lang= username= ... usw.Was könnte das Hauptproblem sein?Danke im Voraus.GeorgeAnimal. 16:45, 24 May 2012 (UTC)
- Bei mir heisst die Datei user-config.py und hat folgenden Inhalt:
family = 'wiktionary'
mylang = 'en'
usernames['wiktionary']['en'] = 'BuchmeierBot'
Ist schon was länger her, dass ich den PywikipediBot configuriert habe, aber es ging bei mir ohne Probelme. Matthias Buchmeier (talk) 08:02, 29 May 2012 (UTC)
BuchmeierBot: incorrect Portuguese conjugations
editHello. The conjugation table for progredir was incorrect. As a result, BuchmeierBot created the following incorrect entries (in all cases the last <e> should be an <i>):
I will fix the conjugation table soon. — Ungoliant (Falai) 01:02, 26 June 2012 (UTC)
- OK, I've deleted the erroneous forms, and will rerun the bot to create the missing forms. Matthias Buchmeier (talk) 09:12, 26 June 2012 (UTC)
- You deleted the lemma! Oops! — Ungoliant (Falai) 15:31, 26 June 2012 (UTC)
Riesenlob
edit- Hi Matthias
- Zuerst möchte ich ein Riesenlob für dich aussprechen meinerseits , was Erstellung der kurdischen Verbformen per Bot betrifft.Habe auch 'ne Frage, was die verneinten Verbformen betrifft. Wie soll ich das ausdrucken: Geht das so: First-person singular present negative form of {{{verb}}}.Liebe Grüße--GeorgeAnimal. 10:22, 11 July 2012 (UTC)
- For Spanish negative imperative, it looks like in pidas. Matthias Buchmeier (talk) 10:44, 11 July 2012 (UTC)
Not all qualifiers are glosses
editHi, BuchmeierBot is changing all instances of {{qualifier}}
to {{gloss}}
in French definition lines, but many of them are not glosses and should remain {{qualifier}}
. —Angr 15:55, 11 July 2012 (UTC)
- No it's not changing all instances. I am running the bot in manual mode, that means I will check for each replacement, if it makes sense, before submitting. Matthias Buchmeier (talk) 16:08, 11 July 2012 (UTC)
- I'm not sure what
{{gloss}}
says, but I thought the use of{{gloss}}
in definitions was correct; I never use qualifier in a definition line, I either use context or gloss. Mglovesfun (talk) 16:34, 11 July 2012 (UTC)- People use
{{qualifier}}
on foreign language (FL) definition lines in order to qualify the English translations rather than the FL-entry, i.e. most commonly{{qualifier|UK}}
and{{qualifier|US}}
. I have also seen, that sometimes{{qualifier}}
is used as replacement of{{context}}
, which probably does no harm, as the main difference between them is that{{context}}
categorizes while{{qualifier}}
doesn't. Matthias Buchmeier (talk) 16:44, 11 July 2012 (UTC) - (ec) I'd say a gloss is a brief statement indicating which meaning of a polysemous English word is intended, e.g. mole (animal) vs. mole (dark spot on the skin) vs. mole (unit in chemistry), etc. The ones I've changed back from
{{gloss}}
to{{qualifier}}
are [5], [6], and [7]. These are further explanations about how the French word is used; they are neither glosses of the English word nor contexts in which the French word is found. —Angr 16:53, 11 July 2012 (UTC)- Your back-changes look OK. Maybe I have been a bit too precipitate. Matthias Buchmeier (talk) 17:00, 11 July 2012 (UTC)
- People use
- I'm not sure what
- No it's not changing all instances. I am running the bot in manual mode, that means I will check for each replacement, if it makes sense, before submitting. Matthias Buchmeier (talk) 16:08, 11 July 2012 (UTC)
Indonesian and Malay dictionaries
editHello Matthias,
A new Wiktionary user here. I like the dictionaries on your user page. Is it possible to add more lists for English-Indonesian and English-Malay? They will be useful for Malay speakers and learners. :) Really appreciate your help, thanks! Raymondhs (talk) 19:52, 11 July 2012 (UTC)
- Hi, I have generated and uploaded the English-Malay dictionary. For Indonesian there seem to be many variants, e.g. Acehnese, Balinese, Banjarese, Buginese, Javanese, Madurese, Minangkabau, Sundanese etc. . Could you tell me which variants should be included/excluded in the dictionary? Matthias Buchmeier (talk) 09:10, 12 July 2012 (UTC)
- Thanks for uploading the en-ms dictionary! I think it's better to just include the standard Indonesian; it's the most widely used in Indonesian literature. It will be a bit messy to include the other local dialects. Raymondhs (talk) 09:52, 12 July 2012 (UTC)
eine Frage
edit- Grüß dich Matthias
- ich habe ein Problem mit der Vorlage template:ku-noun, nämlich ich kann die Pluralformen nicht blinken lassen.Ich habe Mglovesfun gefragt, aber hat nicht darauf geantwortet.Dann dachte ich an dich.Könntest du es reparieren?Wenn es nicht geht, dass kannst es lassen.Meine liebe Grüße--GeorgeAnimal. 09:42, 13 July 2012 (UTC)
- Versuch es selbst zu reparieren. Du solltest am besten zuerst Kommetare mit Zeileneinschüben einfügen, um es lesbarer zu machen. Dann zähl die Klammern, bestimmt sind welche zu viele, zu wenig, oder stehen an falscher Stelle. Viel Glück. Matthias Buchmeier (talk) 10:02, 13 July 2012 (UTC)
- Könntest du diese Formen erstellen [8].Einige davon hat der Bot erstellt, jetzt habe ich die negativformen hinzugefügt.Danke und Gruß-.--GeorgeAnimal. 11:58, 13 July 2012 (UTC)
eine Bitte
edit- Hi Matthias!
- Ich hatte pywikipediabot heruntergeladen, der ordner war leer.Dann habe ich, die Angaben auf metawiki befolgt und habe eine datei mit dem Namen login.py mit dem Inhalt (name, family usw.) erstellt und diese befindet sich jetzt im ordner oywikipediabot.Wenn ich über die Konsole name und passwort des botes eingebe, dann erschein name des botes aber darunter steht usernames is not defined.Muss ich noch etwas anderes machen um mich überhaupt einloggen zu können.Danke!GeorgeAnimal. 10:19, 2 August 2012 (UTC)
- Hast Du user-config.py (siehe oben)? Matthias Buchmeier (talk) 14:25, 2 August 2012 (UTC)
- user-config.py bei mir hat den folgenden Inhalt:
family = 'wiktionary'
mylang = 'ku'
usernames['wiktionary']['ku'] = u'the name of my bot'
console_encoding = 'utf-8'
|
- Den Python-link auf Desktop habe ich zum Pfad bei der Umgebungsvariable hinzugefügt, davor eine Semikolon gesetzt.Über cmd kann ich mich irgendwie nicht anmelden.
- --Liebe Grüße--GeorgeAnimal. 17:46, 2 August 2012 (UTC)
- Versuch es mal aus dem pywikipedia Verzeichnis zu starten. Ausserdem musst Du natürlich den Botnamen korrekt setzen (es muss ein gültiger account auf ku.wiktionary sein . Matthias Buchmeier (talk) 08:46, 3 August 2012 (UTC)
- HI
- Ich habe den Bot heruntergeladen und funktioniert jetzt einwandfrei.Schwirerigkeiten habe ich beim Hochladen einiger Dateien.Wie erstellst du neue Seiten mit deinem Bot.(Oben habe ich gelesen, aber mir fehlt die Grundlage)?--GeorgeAnimal. 12:28, 5 August 2012 (UTC)
- Geht eigentlich eine Datei z.b. dict.txt Für kurdisches Wiktionary:
<<-start->>
(((diqelînim)))
== {{=ku=}} ==
{{tew|ku}}
=== Wate ===
#1. kes dema sade a niha ji lêker [[qelandin]]
<<-stop->>
<<-start->>
(((amade dikî)))
== {{=ku=}} ==
{{tew|ku}}
=== Wate ===
#2. kes dema sade a niha ji lêker [[qelandin]]
<<-stop->>
- Dann:
- pagefromfile.py [global-arguments] -start:{{-start-}} -end:{{-stop-}} -file:dict.txt
- Stimm es so, wenn ich konjugierte Formen per Bot erstellen will.Habe versucht, aber da erschien ein Text mit dem Inhalt:
- Traceback (most ....)
- File pagefromfile.py
... UnicodeDecodeerror 'uft-8' codec can't decode Liebe Grüße--GeorgeAnimal. 10:25, 6 August 2012 (UTC)
- Natürlich nicht, start, end, titlestart, und titleend müssen natürlich zusammenpassen,
hier wären es z.B. <<-start->>, <<-stop->>, ((( und ))). Die runden Klammern funktionieren allerdings nicht ohne weiteres (regex Sonderzeichen), nimm besser z.B. <<<. Ich hoffe das hilft dir weiter. Matthias Buchmeier (talk) 11:26, 6 August 2012 (UTC)
- Hi Matthias
- Danke.Jetzt habe ich es geschafft.meines Bots Beiträge.Ich habe einen Antrag auf Flag gestellt und warte noch auf das Endergebnis.Liebe Grüße--GeorgeAnimal. 15:35, 10 August 2012 (UTC)
- Glückwünsche. Matthias Buchmeier (talk) 15:50, 10 August 2012 (UTC)
- Wenn er Status als Bot kriegt, hoffe ich dich und deinen Bot mit dem Erstellen kurdische Verbformen zu entlasten und gegenebenfalls das zu übernehmen.--GeorgeAnimal. 16:02, 10 August 2012 (UTC)
Translation targets
editGiven the massive amount of German entries that describe in one word what English uses several words for, would you put any limits on what would be allowable as a translation target, other than being a single word in another language? For example I've come across Lieblingstrick, so does tat justify an English entry for favorite trick and favourite trick? What about boat travel for Schifffahrt? Mglovesfun (talk) 10:38, 16 August 2012 (UTC)
- Yes, I would restrict its usage. I think the purpose of a translation target entry is to be able to find the translation of an English term in language X. If the translation follows in a straight forward way from the standard translations of the English parts, then I think the entry is superfluous. Whether the translations is single-word or multi-word is IMHO irrelevant as this depends on the grammatical features of the language, as has been pointed fairly well. On the other hand, in the above example Lieblings- is the usual translation for favourite and can be used in combination with practically any noun, and trick is the usual translation for German Trick. The same is true for Schifffahrt and boat travel, so in that case someone with some basic knowledge of German could infer the German translations and the English entries are not needed. What I would propose as translation-target-CFI is something like a requirement to have: either i) idiomatic translations, or ii) translations that cannot be easily inferred from the translations of the English parts. Matthias Buchmeier (talk) 11:39, 16 August 2012 (UTC)
Ding-Wörterbücher
editHallo Matthias!
Vielen lieben Dank für die Ding-Dumps! Ist es möglich, auch non-english Wörterbücher zu erstellen? Ich wäre an polnisch-russisch interessiert.
Nochmals Danke für Deinen Beitrag,
Bob
- Das sollte sich relativ einfach aus en-pl und en-ru kombinieren lassen, wenn Du ein wenig programmieren kannst. Leider habe ich im Moment keine Zeit dazu, ich werde aber sicher später darauf zurückkommen. Matthias Buchmeier (talk) 09:04, 21 August 2012 (UTC)
I look for Arabic - English Dictionary
editDear Matthias I look for Arabic-English or Arabic-Turkish dictionary.If you can help on this subject,I will be glad to you.My e-mail address : a_r_agaoglu@yahoo.com
- You can follow the download-link Here, to get the English-Arabic dictionary extracted form en.wiktionary.org. This archive contains files to be used with ding or dictd compatible programs, e.g. goldendict and many more. However the coverage of English-Arabic translations on en.wiktionary is not very complete yet. Another better open-source English-Arabic dictionary can be found from arabeyes project, a dictd version of this dictionary can be downloaded from freedict. I hope that helps you. Matthias Buchmeier (talk) 13:37, 11 October 2012 (UTC)
Since you voted in the recent Beer Parlour poll about Tabbed Languages, I'm informing you of this ongoing vote on whether to enable Tabbed Languages. --Yair rand (talk) 08:39, 27 November 2012 (UTC)
Anatoli mentioned here that you might be able to make more indices. Perhaps the rest of the conversation would be interesting to you as well? Thanks —Μετάknowledgediscuss/deeds 02:08, 13 December 2012 (UTC)
Multi-gender f-p, m-p, n-p, m-f, etc. for Russian, German, French, etc.
editHi,
Could you please change your awk program to generate the for the English-Russian dictionary as well? Russian nouns can have two or even three genders and have a gender and be plural. The change applies to other languages as well. --Anatoli (обсудить/вклад) 21:14, 6 May 2013 (UTC)
- Ok I will fix that immediately. I've overlooked the n-p gender as it doesn't show up in Category:Gender and number templates, however there seems to be no template
{{m-f}}
. Is there any other new gender template that I forgot about? Matthias Buchmeier (talk) 23:05, 6 May 2013 (UTC)
- Sorry, it must only apply to gender + plural, not mixed genders :). I've checked Category:Russian nouns with multiple genders and Category:Russian proper nouns with multiple genders, they still use mf, mn, fn and mfn parameters (there are no templates for them). Perhaps need to watch out for any changes? --Anatoli (обсудить/вклад) 23:20, 6 May 2013 (UTC)
- Sorry, I missed that development and not sure exactly what's going on. Some nouns still use "np" (neuter plural), see дрова. I don't know who to ask but I saw you changing the program for Spanish, so suggested to change that. Don't change anything for the moment, we need to clarify what's happening. Sorry for the confusion. --Anatoli (обсудить/вклад) 23:25, 6 May 2013 (UTC)
Reinstating my edit (Trendsetter is not a translation of hipster)
editHi,
I've answered here: User_talk:Gronky#hipster. I wasn't 100% sure when I made my first edit, so I commented out the translations instead of straight deleting them, but I've now done a bit of checking and the English and German Wiktionaries indicate that my edit was right, Trendsetter isn't a correct translation of hipster. Gronky (talk) 03:23, 4 June 2013 (UTC)
Add to Italian dictionary and add a Friulan dictionary
editMandi Matthias, I am very new to this and have no idea how to do this myself so I have 2 requests. 1. Can you add "Primus" to the Italian dictionary. It means first and it is also a surname (It happens to be mine). 2. I am looking for a Friulan (Furlan) dictionary and would love it if it could be added. Friulan is spoken in the Friuli region of Italy. Thanks for any help you can give me :-) Monica65 (talk) 19:18, 9 June 2013 (UTC)
- Hi Monica, You can add your surname yourself. If yopu don't know how to format that, have a look at the wiki-code of some other Italian surnames in Category:Italian surnames (you can click the edit bottom at the top of the page to show the page's wiki-code). With regard to the Friulan, it might be added as soon as somebody interested in that language joins the project. Matthias Buchmeier (talk) 14:43, 10 June 2013 (UTC)
Your index subpages
editI just wanted to let you know that I've found your indexes (like User:Matthias Buchmeier/en-pt-a, etc.) helpful, but there is one thing I don't understand. Why do you include trans-see results in with the rest? I can't see what purpose they would serve, and they take up quite a bit of space on these pages. Ultimateria (talk) 21:52, 4 July 2013 (UTC)
- The idea to add the trans-see results is for use as offline dictionary. But as the the en-pt index is quite small there are a lot of trans-see links without a Portuguese translation. I think that I will exclude the trans-see results for Portuguese with the next update.Matthias Buchmeier (talk) 16:37, 8 July 2013 (UTC)
Could you possibly add the German translation for this word to the translation table, please? Thanks, Razorflame 21:47, 9 August 2013 (UTC)
Hi MB. Do you think you could add an extra bit of code so as to create the vos forms of Spanish verbs? Something like llamá and llamás should do it. --Shegashega (talk) 01:32, 29 August 2013 (UTC)
- Hi WF. I could do that, but rather than hardcoding the forms into the bot I'd like to have them added to the conjugation tables, from where I could easily pull them, whithout having to think about it. Matthias Buchmeier (talk) 16:44, 29 August 2013 (UTC)
- There's a conjugation template that does that with regular -ar verbs and one that could work for all verbs. However, and much to my chagrin, these are not yet put into the the main template, but the vos additions could be easily implanted into that template, just that I can't do it as I'm not admin anymore. --Shegashega (talk) 17:04, 29 August 2013 (UTC)
- More news on this. I fiddled around with Template:es-conj-ar (e-ie), Template:es-conj-ar (andar) and Template:es-conj so these show up automatically. Do you think it's good? I'd like to add stuff to all the templates like Template:es-conj-ar eventually. But first see what the community says -WF
- It looks OK, although I'm not sure about the subjunctive forms. While eswiktionary lists the same forms as your template and I now that those are the forms in use in Argentina/Uruguay the DRAE doesn't list them for some reason. Maybe let's ask the community the might know more about it. Matthias Buchmeier (talk) 20:41, 24 September 2013 (UTC)
- I've meanwhile looked up wikipedia about the issue, and it agrees on the subjunctives, although those are regionally quite diverse as the other forms too. I've also noted that your modification of Template:es-conj doesn't take the vos forms as optional, so it will break all conjugation templates (quite a lot) that doesn't have the vos forms added yet. Matthias Buchmeier (talk) 20:52, 24 September 2013 (UTC)
- This could be a case of asking people who know both templates and Spanish. I'm comparatively crap with templates, despite my 8 years experience on the website. -WF
- I've meanwhile looked up wikipedia about the issue, and it agrees on the subjunctives, although those are regionally quite diverse as the other forms too. I've also noted that your modification of Template:es-conj doesn't take the vos forms as optional, so it will break all conjugation templates (quite a lot) that doesn't have the vos forms added yet. Matthias Buchmeier (talk) 20:52, 24 September 2013 (UTC)
- It looks OK, although I'm not sure about the subjunctive forms. While eswiktionary lists the same forms as your template and I now that those are the forms in use in Argentina/Uruguay the DRAE doesn't list them for some reason. Maybe let's ask the community the might know more about it. Matthias Buchmeier (talk) 20:41, 24 September 2013 (UTC)
- Hi again. You may be interested to know that Template:es-conj-ar has finally been updated, and now allows the vos form. As a result, there are plenty of red links in the verb tables. Hoping that you implement any changes into the bot code. Thanks--ElisaVan (talk) 10:31, 20 October 2013 (UTC)
Finnish-English offline dictionary
editHello, I very appreciate your work in making the conversion script. I have a question about words declensions, would it be easy to parse them from a dump, or are they generated dynamically? This would help many people studying Finnish; I haven't seen any offline dictionary which provides declensions yet. Ilinov 86 (talk) 23:55, 27 October 2013 (UTC)
- Yes, they could be parsed from the dump, however I didn't include them as there are probably alot and they might flood the dictionary. If you have the possibility to run the parser script you can try to modify the Finnish configuration to see what you get. Matthias Buchmeier (talk) 18:52, 28 October 2013 (UTC)
- I have done it in a slightly different way, I got a list of Finnish words from your dictionary and then parsed the website. I put my full text dumps here, it may be useful for people studying Finnish https://en.wiktionary.org/wiki/User:Ilinov_86 Ilinov 86 (talk) 13:45, 28 November 2013 (UTC)
- Would you mind posting the script used to generate these dictionaries? People interested in other languages might benefit from it. Matthias Buchmeier (talk) 19:32, 3 December 2013 (UTC)
- Yes, I can, but the problem is that there is no single script, I had a big amount of gawk one-liners to clean up and format the text. And even after that I had to do some extra by hands to compile the dictionary (though I think it worked OK with GoldenDict in Linux). The reason is that tens of pages contained some human-made mistakes in formatting. Well, probably I should have been more responsible and corrected them also in Wiktionary :( I can upload the scripts but it will require awk knowledge to adjust them for another language, though they may save some time Ilinov 86 (talk) 09:18, 18 December 2013 (UTC)
- If you don't have the time or don't feel like correcting the missformated entries you could try to upload a list with some diescriptions of whats wrong (if not obvious) under Wiktionary:Todo. Probably sooner or later someone might take care of it. Matthias Buchmeier (talk) 23:18, 6 January 2014 (UTC)
- Yes, I can, but the problem is that there is no single script, I had a big amount of gawk one-liners to clean up and format the text. And even after that I had to do some extra by hands to compile the dictionary (though I think it worked OK with GoldenDict in Linux). The reason is that tens of pages contained some human-made mistakes in formatting. Well, probably I should have been more responsible and corrected them also in Wiktionary :( I can upload the scripts but it will require awk knowledge to adjust them for another language, though they may save some time Ilinov 86 (talk) 09:18, 18 December 2013 (UTC)
- Would you mind posting the script used to generate these dictionaries? People interested in other languages might benefit from it. Matthias Buchmeier (talk) 19:32, 3 December 2013 (UTC)
- I have done it in a slightly different way, I got a list of Finnish words from your dictionary and then parsed the website. I put my full text dumps here, it may be useful for people studying Finnish https://en.wiktionary.org/wiki/User:Ilinov_86 Ilinov 86 (talk) 13:45, 28 November 2013 (UTC)
Feature request: Remove transliteration
editFirst of all, congratulations for the wiktionary dumps, they are very useful. I just have one request. Could you add an option to remove the transliterations for non-latin scripts? I am Greek, and I find the constant presence of transliterations very distracting. On a different note, I am using your dumps to create .dsl (goldendict, fora) versions of the dictionaries. Is that something that might interest you? They are plain text files, but the format offers some nice formatting options. My coding skills are very limited, I couldn't really follow your code and incorporate my changes into it. So, this is what I've got for now:
#!/bin/sh
# script copied and modified/improved from the # DingMee Translator: https://chrm.info/cms/dingmee-translator # GNU General Public License either version 3 of the License, or # (at your option) any later version.
if [ $# -ne 2 ] ; then echo "Usage: $0 <INPUT> <OUTPUT>" exit fi INPUT=$1 OUTPUT=$2
sed -e '1s@^# \(.*\) :: \([^ ]*\).*$@\xef\xbb\xbf#NAME "\1-\2 Wiktionary"\n#INDEX_LANGUAGE "\1"\n#CONTENTS_LANGUAGE "\2"\n#SOURCE "https://en.wiktionary.org/wiki/User:Matthias_Buchmeier"\n#LICENSE "Distributed Creative Commons Attribution-Share Alike 3.0 Unported (https://creativecommons.org/licenses/by-sa/3.0/); GNU Free Documentation License"\n\n@' -e '2,5d' $INPUT > $OUTPUT
#Clean up Greek transliterations perl -p -i -e "s@ */.*?/@@g" $OUTPUT
#Remove cross-references perl -i -pe 's/^.* +SEE: +.*\n//g' $OUTPUT
#Make goldendict entries perl -p -i -e "s@(.*?) +:: +(.*)@\1\n\2\n\t\[i\]\[c gray\]\1\[/c\]\[/i\]\n\t\\\ \\\ \\\ \\\ \2@g" $OUTPUT perl -i -pe '!/\t/ && s/, +/\n/g' $OUTPUT perl -i -pe '!/\t/ && s/ *[\{\[\(].*?[\}\]\)] *//g' $OUTPUT
echo "Ready. Generated file: $OUTPUT"
- My code is really a bit difficult to understand because it has be written in a very ad hoc way. This is because the formating of the wiki-code is not yet well settled and changes quite frequently, so it doesn't make much sense to write more structured code. I could add an option to remove the transliterations, that shouldn't be difficult, but it should also be easy to remove them later on, as with your script. Does your script above still have any probelms in doing so? (I'm not a perl expert, so you probably have to try and debug it yourself). Matthias Buchmeier (talk) 20:01, 18 November 2013 (UTC)
- Thanks for the reply. My script simply removes everything it finds inside pairs of slashes (/). (It is just the one line headed with «Clean up Greek transliterations»). I did find errors, which was unavoidable, because slashes could appear for other reasons (then again, it might just be bad wiki formatting). Anyway, I am thinking that in the original dump the transliterations might be more unequivocally shown, which would make their removal less error-prone. Jenniepet (talk) 23:36, 18 November 2013 (UTC)
- Slashes should be rather rare, and if you limit the characters allowed to apear inside transliterations there should be hopefully no more errors. Anyhow it ideed would be nice to mark transliterations more unequivocally, do you have any suggestions? For now I've excluded the Greek transliterations. The dictionary update should be finished in a couple of hours.Matthias Buchmeier (talk) 23:57, 18 November 2013 (UTC)
- I think there is a misunderstanding here. What I meant is, in the original Wiktionary you get e.g. {\{t+|el|τρένο|n|tr=treno}\}. In your dictionary, you see τρένο /tréno/. If you want to remove the transliteration using a regex, the first version is more unequivocal and much easier to work with. But I do think that putting the transliterations inside slashes is a nice way of presenting them in the final document. P.S. Should I have placed this comment at the bottom of the thread? I'm not sure. 83.217.147.107 16:10, 19 November 2013 (UTC)
- You may need to discuss this with the community. What is bad for a single person, may be quite useful for the majority. I personally don't find transliteration distracting or useless, especially for languages I don't speak or know very little about. --Anatoli (обсудить/вклад) 23:51, 18 November 2013 (UTC)
- This is precisely why I proposed having an option to remove them, exactly like with pronunciations. I think there are cases where they can be very useful. For example, for people who don't have a Greek font or Greek keyboard installed. Also, I think that in languages such as Mandarin, transliterations are used as a pronunciation tool, even for native speakers. On the other hand, I've never seen a Greek dictionary (bilingual or monolingual) containing transliterations. So, I think it is safe to say that most users of an English-Greek dictionary will not be expecting transliterations. And now I'm curious whether there are any Russian dictionaries that show transliterations. 83.217.147.107 16:10, 19 November 2013 (UTC)
- Slashes should be rather rare, and if you limit the characters allowed to apear inside transliterations there should be hopefully no more errors. Anyhow it ideed would be nice to mark transliterations more unequivocally, do you have any suggestions? For now I've excluded the Greek transliterations. The dictionary update should be finished in a couple of hours.Matthias Buchmeier (talk) 23:57, 18 November 2013 (UTC)
- Thanks for the reply. My script simply removes everything it finds inside pairs of slashes (/). (It is just the one line headed with «Clean up Greek transliterations»). I did find errors, which was unavoidable, because slashes could appear for other reasons (then again, it might just be bad wiki formatting). Anyway, I am thinking that in the original dump the transliterations might be more unequivocally shown, which would make their removal less error-prone. Jenniepet (talk) 23:36, 18 November 2013 (UTC)
Russian translations with stress marks
editHi,
I noticed a new problem with the Russian translations. Translations automatically remove stress marks (and can transliterate automatically). In User:Matthias_Buchmeier/en-ru-a terms with stress marks link incorrectly, e.g. абба́тство, instead of абба́тство. Do you think you can fix it? --Anatoli (обсудить/вклад) 11:59, 19 November 2013 (UTC)
- That should be quite easy but it requires some cyrillic character replacement list. Do you know which module actually performs that task?Matthias Buchmeier (talk) 17:58, 19 November 2013 (UTC)
- Nevermind, I've already found it in Module:languages/alldata. I'll see what I can do.Matthias Buchmeier (talk) 18:28, 19 November 2013 (UTC)
- Sorry, is it not Module:links? --Anatoli (обсудить/вклад) 23:48, 19 November 2013 (UTC)
- The work is done by Module:links, but the data is pulled out of Module:languages/alldata.Matthias Buchmeier (talk) 16:46, 20 November 2013 (UTC)
Meaning of tranquilícese
editHi, you created the entry for tranquilícese and defined it as "Compound of the formal second-person singular (usted) imperative form of tranquilizar, tranquilice and the pronoun se" – but what does the word mean in English? O'Dea (talk) 19:23, 29 November 2013 (UTC)
- It means calm down!. I've added the meaning to the page.Matthias Buchmeier (talk) 16:44, 2 December 2013 (UTC)
- Gracias. O'Dea (talk) 03:07, 6 December 2013 (UTC)
Feeding the bot
editHi, since I can't edit user subpages, I'm feeding the bot here with Spanish verbs ocluir and derruir. --ElisaVan (talk) 10:46, 30 November 2013 (UTC)
- OK, I've moved the feedpage to the talk section. I hope you can edit it now. As soon as I find the time to adapt the bot bot code to the new format with vos-forms I'll run it and probably also create some vos forms. Matthias Buchmeier (talk) 16:52, 2 December 2013 (UTC)
- PS common one-off verbs like hacer still need vos in their conj-tables (and the vos forms should be created). Thanks! —Μετάknowledgediscuss/deeds 02:26, 3 December 2013 (UTC)
Special characters
editHi Matthias, thank you for generating the English-Hungarian index. It is very useful, especially since the Index:Hungarian was not refreshed since 4/28/2012. Just one comment. I've noticed that when the translation table contains a word ending in a special character (such as mi? = 'what?'), the translation table correctly links to the word without the question mark mi, but your list points to the word+question mark combination which will not exist. Previously, I used the {t|hu|mi|alt=mi?} format in the trans table, but today I discovered that the {t|hu|mi?} format correctly links to 'mi'. Not sure how this would impact the Index:Hungarian if it is ever refreshed, or if this is something you want to consider in your code. Thanks again. --Panda10 (talk) 15:15, 21 December 2013 (UTC)
- The reason is the same as discussed above in thread 'Russian translations with stress marks'. Links are now generated via a Lua module on a per language basis. I can generate the correct links for Hungarian, when I find some time to copy the corresponding replacement lists. Matthias Buchmeier (talk) 19:46, 21 December 2013 (UTC)
Asturian index
editHi MB, happy new year to you. Would it be possible for you to create an index of Asturian entries, like the ones you've made for other languages. It'd be useful to see if any new entries can be created, or are missing, from the translations sections. Thanks and keep up the great work! --ElisaVan (talk) 13:55, 31 December 2013 (UTC)
multilingual dictionaries, awk script discussion
editHi again. First of all, thanks for including the transliteration option in the awk script.
I have been working on it, so here are a few things that caught my eye:
lang=="Indonesian|Standard Indonesian|Stabdard";
replace with:
lang="Indonesian|Standard Indonesian|Standard";
lang_qualifier="|Serbain|Bosnian|Croatian";
replace with:
lang_qualifier="|Serbian|Bosnian|Croatian";
TR=gensub(regexp, "\\1\\5 {{\\4}}", "g", TR);
regexp = "(\\{\\{t\\|("iso")\\|[^}]*)(\\|)([mfnspc])(\\}\\}|\\|[^}]*\\}\\})";
TR=gensub(regexp, "\\1\\5 {\\4}", "g", TR);
TR=gensub(regexp, "\\1\\5 {\\4}", "g", TR);
replace with:
TR=gensub(regexp, "\\1\\5 {\\4}", "g", TR);
regexp = "(\\{\\{t\\|("iso")\\|[^}]*)(\\|)([mfnspc])(\\}\\}|\\|[^}]*\\}\\})";
TR=gensub(regexp, "\\1\\5 {\\4}", "g", TR);
I have modified the script to give me machine-readable output so that I can modify it further with something that isn't awk. But I also tried to get it to produce multilingual dictionaries. The problem was that when I specified LANG="French,German" your default exclude_lang options didn't work cumulatively, and I ended up with entries for Old High German, etc. I worked around this issue by separating the lang_exclude from the other language specific options and using this syntax:
if(lang ~ /French/) exclude_lang = exclude_lang "|French Creole|Old French|Middle French|Gallo";
instead of the original:
if(lang=="French") exclude_lang = "French Creole|Old French|Middle French|Gallo";
I actually tried to do the same with the other language related parameters (iso, lang_qualifier, etc) but it wouldn't work. So, I kept only exclude_lang. I think this change is something that you might be interested in incorporating, so I'm including my code in the end.
One problem with my approach is that sometimes a language is found both in exclude_lang and LANG or ISO. At the moment, exclude_lang overrides LANG, but it should be the opposite and I don't know how to add code for that. (I can always use manual excludes I guess).
There's one thing I haven't understood. If I define LANG="French|German" apparently I don't need to also define ISO. But what happens if I define ISO="de|fr" or ISO="fr|de|el"?
Anyway, I'm closing with my updated script. If you're interested only in the exclude_lang syntax, just copy the first 250 lines. If you or anyone else is interested in using it as a whole, here's an example of the output:
bridge :: {n} {{sen*replacement for teeth}} :: {{t|fr|bridge|m}}, {{qual*Canada}} {{t|fr|pont|m}}; {{t|de|Brücke|f}}
And here's the script itself:
- I've removed this since it was taking far too much space. I'll probably publish my wiktionary2dsl script somewhere when it's finalised. In the meantime, you can email me for a copy. Jenniepet (talk) 09:15, 26 February 2014 (UTC)
Jenniepet (talk) 07:20, 11 February 2014 (UTC)
- I ran your script for French and it didn't move all instances of gender outside the brackets. I think your relevant code is this:
# regexp = "({{(t|t[+]|t[-]|tø)\\|("iso")\\|[^\\|}]*)(\\|)([mfnspc])([^}]*}})";
regexp = "(\\{\\{t\\|("iso")\\|[^}]*)(\\|)([mfnspc]\\|[mfnspc])(\\}\\}|\\|[^}]*}})";
TR=gensub(regexp, "\\1\\5 {\\4}", "g", TR);
regexp = "(\\{\\{t\\|("iso")\\|[^}]*)(\\|)([mfnspc])(\\}\\}|\\|[^}]*\\}\\})";
TR=gensub(regexp, "\\1\\5 {\\4}", "g", TR);
- But I couldn't figure out how to correct it, so I added this instead:
$0 = gensub(/(\{\{t\|[^}]*)(\|(impf|pf))(\}\}|\|[^}]*\}\})/, "\\1\\4 {\\3}", "g", $0);
#jen added 3 lines to move gender, number outside t-template
$0 = gensub(/(\{\{t\|[^}]*)\|([mnfcsp]\|[mnfcsp]\|[mnfcsp])(\}\}|\|[^}]*\}\})/, "\\1\\3 {{\\2}}", "g", $0);
$0 = gensub(/(\{\{t\|[^}]*)\|([mnfcsp]\|[mnfcsp])(\}\}|\|[^}]*\}\})/, "\\1\\3 {{\\2}}", "g", $0);
$0 = gensub(/(\{\{t\|[^}]*)\|([mnfcsp])(\}\}|\|[^}]*\}\})/, "\\1\\3 {{\\2}}", "g", $0);
- I also noticed that there were some sub,sup,u,small html tags in some dictionaries.
- Other things you might want to take into account are below. I used Perl to remove this left-over formatting:
perl -i -pe 's/<nowiki>([\[\]\{\}])<\/nowiki>/\\\1/g' $FILE
perl -i -pe 's/\{\{unsupported\|//g' $FILE
perl -i -pe 's/\{\{,\}\}/,/g' enwikta.temp
perl -i -pe 's/\{\{NNBS\}\}/ /g' enwikta.temp
perl -i -pe 's/\{\{\qualifier, see also\: /\{\{qualifier/g' enwikta.temp
perl -i -pe 's/.*\{\{not used\|...?\}\}.*\n//g' enwikta.temp
- Jenniepet (talk) 09:37, 26 February 2014 (UTC)
- Hi Jenni, thanks a lot for your feedback. I know that there are issues with gender templates, which have changed a lot over the last couple of weeks. I didn't yet have the time to look at this, but will certainly correct these problems sooner or later. The other fixes are instances of very rarely used templates. The problem with this is that there is a huge zoo of possible templates which might be used occasionally and it requires a lot of work to keep track of all of them. I will also include these proposed fixes before the next dictionary update. Matthias Buchmeier (talk) 18:29, 26 February 2014 (UTC)
- Jenniepet (talk) 09:37, 26 February 2014 (UTC)
Bot work
editHey Matt, are you planning on running BuchmeierBot for Spanish verb forms any time soon? --Type56op9 (talk) 17:36, 9 July 2014 (UTC)
- I didn't yet have the time to adapt the bot code to include the voseo forms, but I will positively run the bot in the nearer future. Matthias Buchmeier (talk) 16:17, 10 July 2014 (UTC)
Pursuant to the RfD discussion, I have restored television show. As you supported this restoration, please improve this entry through the addition of citations supporting the definitions provided and any other materials that would demonstrate its value to the corpus. Cheers! bd2412 T 15:44, 1 August 2014 (UTC)
Greetings, Matthias Buchmeier. AFAICT from my use of the online Duden, German has three main forms that correspond in meaning to the English Nazarite (and its numerous forms). Judging by de:w:Nasiräer, Nasiräer is the form prefered, with Nazaräer and Nazoräer being secondary. Can you tell me why this is, please? The etymological form would surely be *Naziräer; does this form not exist? And if it does, why isn't it the prefered form? Thanks in advance for any light you can shed on this for me. — I.S.M.E.T.A. 02:21, 27 October 2014 (UTC)
- I'm not sure. I only know about Nazaräer, which is the only form I ever heard about. It's from church/bible context and I'm pretty sure it's derived from Nazaräa, which is the German name of the place. I guess that the latter is of Latin origin, compare i.e. with Spanish Nazarea. There doesn't seem to be a tilde on my Android keyboard, so I guess I not able to sign :-( — This unsigned comment was added by Matthias Buchmeier (talk • contribs) at 03:40, 27 October 2014.
- Hmmm. I decided to create entries for those three terms, largely based on their entries in the Duden. The results are surprising: The Duden doesn't record Nazaräer or Nazoräer being used in the sense of "Nazarite"; it states that it's only Nasiräer that's used that way. This seems to be contrary to your experience. Would you mind fact-checking the entries I've created, please? — I.S.M.E.T.A. 22:28, 27 October 2014 (UTC)
Spanish bot again - unfinished entries
editHi there. I added a bit of code to Template:es-conj-ar, which made it possible to pick up any verbs which still have red links in the conjugations table. I thought you might want to process these the next time that you run the bot for Spanish verbs. Mostly the red links in question are the past participle forms, as well as the "vos" forms. I'll do the same for -ir and -er verbs. --Type56op9 (talk) 12:00, 22 December 2014 (UTC)
- I split them up into three groups - Ones with many red links (probably all red - the most useful category), ones with voseo red links, and Category:Spanish verbs having past participle red links in their conjugation table. Hope this helps, and I'm sorry if I broke anything. --Type56op9 (talk) 12:09, 22 December 2014 (UTC)
Hi Matt. Could you please unprotect Template:es-adj? There's a little change I'd like to make to it. I explained it at Template talk:es-adj. --SuperWonderbot (talk) 17:57, 17 January 2015 (UTC)
Spanish bot
editHi Matt. Have a look at User:Type56op9/bot code. It is a page I made ready for bot entries for Spanish verb forms. Feel free to use it the next time you're running the bot! Regards --Type56op9 (talk) 14:05, 28 January 2015 (UTC)
- Thanks WF, I don't have brodband internet at the moment, but once I get it I will run the bot.Matthias Buchmeier (talk) 17:35, 28 January 2015 (UTC)
Spanisch Deutsch
editHi! Was du hier in Bezug auf Vokabelwerke machst ist höchst löblich und lange überfällig. Es scheint ja grausamerweise noch gar keine freie Vokabel-Quelle für Spanisch<->Deutsch zu existieren, deswegen die Frage, weil ich auch noch nicht ganz durchblicke: könntest du das mal analog zu den anderen anlegen? mfg. --Itu (talk) 07:48, 7 May 2015 (UTC)
- Ich habe das für andere Spachkombinationationen noch nicht gemacht, da es sehr viele Kombinationsmöglichkeiten gibt und jeder, der Linux/Unix nutzt das mit diesem Skript einfach selbst machen kann. Wenn Du mir sagst an welchen Wörterbüchern Du genau interessiert bist kann ich sie aber demnächst für dich mitkompilieren. Matthias Buchmeier (talk) 11:31, 7 May 2015 (UTC)
- Hm, ich steh da etwas auf dem Schlauch: was soll denn für $DICT1 u. 2 eingegeben werden? Da muss ich ja erst eine passende Datei auf meinem Rechner haben, wo finde oder bekomme ich die?
- Haben will ich, wie gesagt, ein Wörterbuch um spanische Wörter auf deutsch anzuzeigen, sekundär auch die umgekehrte Richtung. --Itu (talk) 04:46, 9 May 2015 (UTC)
- Oh, tut mir leid. Ich dachte Spanish-Deutsch wäre schon mit im Packet, habe mich aber geirrt. Das obige Skript kann z.B. Spanisch-Deutsch aus Englisch-Spanisch und Englisch-Deutsch (im ding Format) erzeugen. Die beiden letzteren Wörterbücher sind schon im Packet enthalten (siehe snapshot link in User:Matthias Buchmeier/download). Ich hoffe das hilft Dir weiter. Das Extrahieren aus dem Deutschen bzw Spanischen Wiktionary macht im Moment leider keinen Sinn, da die Übersetzungen dort viel zu unvollständig sind, und Wörterbücher daraus ziemlich umbrauchbar. Matthias Buchmeier (talk) 15:10, 9 May 2015 (UTC)
únete
editYou created a definition of únete, describing it as " informal second-person singular (tú) affirmative imperative form of unir, une and the pronoun te".
This may be a technical-grammatical description, but it is not a dictionary translation of the Spanish into English. Can you add a translation, so that readers searching for insight can get past the technical jargon to an actual understanding of the word, please. O'Dea (talk) 14:07, 25 August 2015 (UTC)
Which version of dump is the bot using?
editThe edit summary says "automatic upload of files generated from 20151102 database dump", but I find material that has not been in entries for nearly 13 months. Are you sure you are using the latest dump? Could there be some kind of glitch in the dump-generation process?
This came to my attention because of the current content of Category:Entries using the taxlink template, which I regularly clean up. Compare the current content of forelli with what your appears in your fi-en-f. Also, look at the edit history for forelli. DCDuring TALK 02:27, 8 November 2015 (UTC)
- Yes, there are some few glitches in my dictionary creation script. I'm trying to remove all templates but its difficult to keep up with the everchanging template zoo. Sorry, I'll fix those taxlink issues with the next dump. Matthias Buchmeier (talk) 11:21, 8 November 2015 (UTC)
- What I find curious is that it seems the only instances of
{{taxlink}}
that appear in your many pages of results are those that do not have "noshow=1" and that a look at the edit history for the corresponding entry shows that "noshow=1" was present for quite some time or that{{taxlink}}
was never present without "noshow=1". It is the most minor of annoyances that the glitch causes me, nothing of substance. I was wondering more about whether in all regards you were actually using the dump your edit summary said it was, but something else must be going on. A regex problem? DCDuring TALK 15:28, 8 November 2015 (UTC) - Yes, it was a regex problem. My error. I've now fixed it. Thanks for pointing that out to me. cheers Matthias Buchmeier (talk) 17:00, 8 November 2015 (UTC)
- I find it too easy to have a regex problem. I overestimated what I could reliably do with regexes. But the more expertise one gains, .... DCDuring TALK 18:25, 8 November 2015 (UTC)
- That's absolutely true. If oftentimes stumble over unexpected regex behaiviour too. Matthias Buchmeier (talk) 19:03, 8 November 2015 (UTC)
- I find it too easy to have a regex problem. I overestimated what I could reliably do with regexes. But the more expertise one gains, .... DCDuring TALK 18:25, 8 November 2015 (UTC)
- What I find curious is that it seems the only instances of
- Thanks: AFAICT, the problem that concerned me has been addressed. DCDuring TALK 17:08, 25 November 2015 (UTC)
Hey MB. As the creator of the template Template:es-compound of, could I ask you to take a look at this comment by SGB, and see if you can incorporate some of that into Template:es-compound of? Danke --Stubborn Pen (talk) 17:10, 10 January 2016 (UTC)
- PS, I'll do my best not to type "adejctive" any more. --Stubborn Pen (talk) 17:12, 10 January 2016 (UTC)
- OK, Thou shalt be forgiven. To pay penance you can clean up these pages (mostly yours):
patibulario plutarquiano ponderado ponderado precitado reasegurador rezandero richardsoniano sciolista secretista semioculto sociorracial somatosensorial subjetivista Matthias Buchmeier (talk) 20:39, 10 January 2016 (UTC)
English-Basque dictionary
editDear Matthias,
I'm looking for a Basque-English dictionary. Do you know how I could get one? I tried to use the gawq script with enwiktionary-20160901-pages-articles.xml.bz2 and language=Basque but I get the following error: >bzcat data/enwiktionary-20160901-pages-articles.xml.bz2|gawk -v LANG=Basque -v ISO=iso-code -f trans-en-es.awk|sort -d -k 1,1 -t"{">en-xx.wikisort: string comparison failed: Illegal byte sequence sort: Set LC_ALL='C' to work around the problem. sort: The strings compared were `FrenchProven\303al ' and `pronounceableness '. Thanks for your help,
--Chloebt (talk) 14:50, 8 September 2016 (UTC)
- Hi Chloebt, I could build you one from English Wiktionary, but I fear that Basque coverage is very poor at the moment and it won't be of much use. I know that there are good free Spanish Basque dictionaries on the internet, but my educated guess is that practically everyone learning Basque speaks Spanish and everyone speaking Basque speaks Spanish too (although some might pretend they don't) so chances of finding Basque Engligh are bad. Matthias Buchmeier (talk) 19:29, 8 September 2016 (UTC)
- Hi Matthias, thank you very much for your answer! It would be very helpful for me if you could build this Basque-English Wiktionary, even if the coverage is far from perfect. It could be interesting later to compare with a system first translating to Spanish (but I would need a Spanish Basque dictionary). In fact, I know that a team built an English-Basque lexicon (IXA group), but I can't find it for now. --Chloebt (talk) 11:44, 9 September 2016 (UTC)
- OK, I'll see if I find time to compile the dictionary tomorrow or sunday. You could also try it yourself again. Tiping export LC_ALL='C' in the terminal before you start the compilation will fix the sorting error. You must also provide iso-code for Basque via -v ISO=eu. This output a English-Basque dictionary from the translation sections of the English entries. You can also compile a Basque English dictionary from the Basque entries using the second script provided in User:Matthias Buchmeier/trans-en-es.awk with will give you additional translations. Matthias Buchmeier (talk) 19:03, 9 September 2016 (UTC)
- Fantastic, I'm going to try right now! I'll let you know how it goes. Thanks again.--Chloebt (talk) 19:37, 9 September 2016 (UTC)
- Thanks you again, it worked very well.
- Fantastic, I'm going to try right now! I'll let you know how it goes. Thanks again.--Chloebt (talk) 19:37, 9 September 2016 (UTC)
- OK, I'll see if I find time to compile the dictionary tomorrow or sunday. You could also try it yourself again. Tiping export LC_ALL='C' in the terminal before you start the compilation will fix the sorting error. You must also provide iso-code for Basque via -v ISO=eu. This output a English-Basque dictionary from the translation sections of the English entries. You can also compile a Basque English dictionary from the Basque entries using the second script provided in User:Matthias Buchmeier/trans-en-es.awk with will give you additional translations. Matthias Buchmeier (talk) 19:03, 9 September 2016 (UTC)
- Hi Matthias, thank you very much for your answer! It would be very helpful for me if you could build this Basque-English Wiktionary, even if the coverage is far from perfect. It could be interesting later to compare with a system first translating to Spanish (but I would need a Spanish Basque dictionary). In fact, I know that a team built an English-Basque lexicon (IXA group), but I can't find it for now. --Chloebt (talk) 11:44, 9 September 2016 (UTC)
Source of subtitles data
editHello, first let me say thank you for the Italian frequency lists. I'm interested in the source of the data, is it available somewhere? Do you have a list of the movies included in the subtitles ? I see some of the words are not really Italian, perhaps they are character names of some series. --Sumail (talk) 18:26, 30 October 2016 (UTC)
- Hi @Sumail, sorry for the late response. The source of the data are online subtitle files. I should still have it on a backup drive, if you're still interested I could pull it out for you. I also have a list of files. Of course most are US movies and series, but there are also some Italian. Cheers Matthias Buchmeier (talk) 11:19, 13 November 2016 (UTC)
- If that's not a problem, it would be interesting. Thanks! --Sumail (talk) 13:31, 17 November 2016 (UTC)
Purpose of edits
editWas there any real reason for your edits to fishing vessel, hairdressing salon, rev counter, and iron(III) chloride? DonnanZ (talk) 17:52, 27 April 2017 (UTC)
- I think that there is absolutely no reason why an l-template should be nested inside a trans-see template, so it's an unnecessary complication. If someone wanted to link to the English section then that functionality should rather be implemented in the trans-see template. Matthias Buchmeier (talk) 19:33, 27 April 2017 (UTC)
- So that's the only reason? No technical problems? I was creating a more direct link which the trans-see template fails to do, if there are no technical problems I will revert your edits. DonnanZ (talk) 20:14, 27 April 2017 (UTC)
- Technical problems will most likely arrise if someone someday tries to update these templates with a bot. Matthias Buchmeier (talk) 21:13, 27 April 2017 (UTC)
- I have raised the matter of improving
{{trans-see}}
, which would be the best solution, in the Grease Pit. DonnanZ (talk) 22:02, 27 April 2017 (UTC)
- I have raised the matter of improving
- Technical problems will most likely arrise if someone someday tries to update these templates with a bot. Matthias Buchmeier (talk) 21:13, 27 April 2017 (UTC)
- So that's the only reason? No technical problems? I was creating a more direct link which the trans-see template fails to do, if there are no technical problems I will revert your edits. DonnanZ (talk) 20:14, 27 April 2017 (UTC)
Share your experience and feedback as a Wikimedian in this global survey
editHello! The Wikimedia Foundation is asking for your feedback in a survey. We want to know how well we are supporting your work on and off wiki, and how we can change or improve things in the future.[survey 1] The opinions you share will directly affect the current and future work of the Wikimedia Foundation. You have been randomly selected to take this survey as we would like to hear from your Wikimedia community. To say thank you for your time, we are giving away 20 Wikimedia T-shirts to randomly selected people who take the survey.[survey 2] The survey is available in various languages and will take between 20 and 40 minutes.
You can find more information about this project. This survey is hosted by a third-party service and governed by this privacy statement. Please visit our frequently asked questions page to find more information about this survey. If you need additional help, or if you wish to opt-out of future communications about this survey, send an email to surveys@wikimedia.org.
Thank you! --EGalvez (WMF) (talk) 22:25, 13 January 2017 (UTC)
- ^ This survey is primarily meant to get feedback on the Wikimedia Foundation's current work, not long-term strategy.
- ^ Legal stuff: No purchase necessary. Must be the age of majority to participate. Sponsored by the Wikimedia Foundation located at 149 New Montgomery, San Francisco, CA, USA, 94105. Ends January 31, 2017. Void where prohibited. Click here for contest rules.
Attempt to create a French-Spanish dictionary
editHello Matthias,
First of all, thank you for your incredible work !
I tried to follow your instructions to create a fr-es dictionary (on Linux) and without tweaking, I get this (pastebin link). If a replace those two lines :
# lineDEST=`grep "$lineEN" $DICT2 | sed "s/.*::\ \(.*$\)/\1/g"`
lineDEST=`grep "$termEN.*$glossEN" $DICT2 | sed "s/.*::\ \(.*$\)/\1/g"`
by this :
lineDEST=`grep -F "$lineEN" $DICT2 | sed "s/.*::\ \(.*$\)/\1/g"`
# lineDEST=`grep "$termEN.*$glossEN" $DICT2 | sed "s/.*::\ \(.*$\)/\1/g"`
I get a dictionary of 35824 lines (which seems nice). I would prefer to use the more robust line but I don't see how to make it work. Is there something that I missed ? I'm using GNU grep 2.20 (bundled in a Debian Jessie).
Thank you for your help,
Jona (talk) 22:14, 31 May 2017 (UTC)
- Dear Jona, thank you for the bug report. It turns out that the script doesn't work for dictionaries with wikilinks. I didn't try that out until now, and the original author most likely neither. I don't have any clue yet whrere's the problem, but if you don't need the wikilinks you can remove them with the -v REMOVE_WIKILINKS="y" option when building the en-fr and en-es dictionaries. I hope that helps. Cheers Matthias Buchmeier (talk) 22:49, 2 June 2017 (UTC)
- I think I found the problem: grep treats the search strings as regular expressions, i.e. the square brackets in $termEN are intepreted as character lists. I'll try to fix the script tomorrow. Matthias Buchmeier (talk) 23:13, 2 June 2017 (UTC)
- The script should be fixed now. Matthias Buchmeier (talk) 15:46, 4 June 2017 (UTC)
- It works great, thank you ! Jona (talk) 20:33, 4 June 2017 (UTC)
- The script should be fixed now. Matthias Buchmeier (talk) 15:46, 4 June 2017 (UTC)
- I think I found the problem: grep treats the search strings as regular expressions, i.e. the square brackets in $termEN are intepreted as character lists. I'll try to fix the script tomorrow. Matthias Buchmeier (talk) 23:13, 2 June 2017 (UTC)
The word "palomita" in Argentina?
editHello. I know this was in 2009 (almost 2010), but yesterday I saw you edited the page palomita and put that this word in Argentina means "bow tie". Is that checked? I mean, I am from Argentina and I have never heard anyone use the word "palomita", but moño (or "moñito"). I have even looked for this word on Internet, but I couldn't find anything. It's a regional thing? --Julian_L (talk) 04:24, 5 March 2018 (UTC)
- I must have looked it up, but I don'l remember where. Looking into that again it seems that palomita in the clothing sense more commonly refers to a collar type "cuello palomita". Matthias Buchmeier (talk) 07:01, 5 March 2018 (UTC)
- I just looked for that, and it seems that "cuello palomita" and "cuello de paloma" are the Spanish equivalents for what people calls a wing collar in English, but still no results for "bow tie". --Julian_L (talk) 07:32, 5 March 2018 (UTC)
- It seems to be not attestable, so I've removed that meaning. Matthias Buchmeier (talk) 08:30, 5 March 2018 (UTC)
- I understand. And what do you think about the words "cuello palomita" and "cuello de paloma" as translations for "wing collar"? Do we have the sufficient information to confirm that? Anyway, thank you for your attention!--Julian_L (talk) 09:41, 5 March 2018 (UTC)
- Yes that's attestable. I get 901 hits for cuello palomita and 1400 for cuello de paloma on google books. Matthias Buchmeier (talk) 09:58, 5 March 2018 (UTC)
- I understand. And what do you think about the words "cuello palomita" and "cuello de paloma" as translations for "wing collar"? Do we have the sufficient information to confirm that? Anyway, thank you for your attention!--Julian_L (talk) 09:41, 5 March 2018 (UTC)
- It seems to be not attestable, so I've removed that meaning. Matthias Buchmeier (talk) 08:30, 5 March 2018 (UTC)
- I just looked for that, and it seems that "cuello palomita" and "cuello de paloma" are the Spanish equivalents for what people calls a wing collar in English, but still no results for "bow tie". --Julian_L (talk) 07:32, 5 March 2018 (UTC)
Share your experience and feedback as a Wikimedian in this global survey
editHello! The Wikimedia Foundation is asking for your feedback in a survey. We want to know how well we are supporting your work on and off wiki, and how we can change or improve things in the future. The opinions you share will directly affect the current and future work of the Wikimedia Foundation. You have been randomly selected to take this survey as we would like to hear from your Wikimedia community. The survey is available in various languages and will take between 20 and 40 minutes.
You can find more information about this survey on the project page and see how your feedback helps the Wikimedia Foundation support editors like you. This survey is hosted by a third-party service and governed by this privacy statement (in English). Please visit our frequently asked questions page to find more information about this survey. If you need additional help, or if you wish to opt-out of future communications about this survey, send an email through the EmailUser feature to WMF Surveys to remove you from the list.
Thank you!
Reminder: Share your feedback in this Wikimedia survey
editEvery response for this survey can help the Wikimedia Foundation improve your experience on the Wikimedia projects. So far, we have heard from just 29% of Wikimedia contributors. The survey is available in various languages and will take between 20 and 40 minutes to be completed. Take the survey now.
If you have already taken the survey, we are sorry you've received this reminder. We have design the survey to make it impossible to identify which users have taken the survey, so we have to send reminders to everyone. If you wish to opt-out of the next reminder or any other survey, send an email through EmailUser feature to WMF Surveys. You can also send any questions you have to this user email. Learn more about this survey on the project page. This survey is hosted by a third-party service and governed by this Wikimedia Foundation privacy statement. Thanks!
Your feedback matters: Final reminder to take the global Wikimedia survey
editHello! This is a final reminder that the Wikimedia Foundation survey will close on 23 April, 2018 (07:00 UTC). The survey is available in various languages and will take between 20 and 40 minutes. Take the survey now.
If you already took the survey - thank you! We will not bother you again. We have designed the survey to make it impossible to identify which users have taken the survey, so we have to send reminders to everyone. To opt-out of future surveys, send an email through EmailUser feature to WMF Surveys. You can also send any questions you have to this user email. Learn more about this survey on the project page. This survey is hosted by a third-party service and governed by this Wikimedia Foundation privacy statement.
dictionary
edithi please add dictionarys to mdx format
- What do you mean with mdx? Is it a dictionary format? Or do you need a dictionary for the Dizin language? Matthias Buchmeier (talk) 13:09, 20 April 2018 (UTC)
mdx is dictionary format open with mdict
- I didn't find any documentation for the format, just a windows binary tool to create the mdx dictionaries. Do you have documentation on how the mdx dictionary files should be formated? Matthias Buchmeier (talk) 16:20, 20 April 2018 (UTC)
this address https://www.mdict.cn/wp/?page_id=5325&lang=en
- Sorry, but the builder tool is Microsoft Windows only. I dont't have access to a Windows PC at the moment. Maybe you can build the mdx dictionaries yourself. I could help help you with a script to convert the dictionaries to the proper input format for the mdx dictionary builder tool. Matthias Buchmeier (talk) 18:51, 20 April 2018 (UTC)
mistake
editHey. Once, 10 years ago, you made a mistake with the gender of despliegue. As we're in different countries, I can't do it, so please give yourself the required 50 (5 for each year the mistake goes unnoticed, according to the rules, IINM) thrashes of the lexi-whip. --Genecioso (talk) 13:30, 31 May 2018 (UTC)
- I already gave myself the 3 single thrashes for "adejctive", here and [9]. And my ass still hurts from the previous rounds of self-whipping. --Genecioso (talk) 13:37, 31 May 2018 (UTC)
- Hi WF, don't take my fixes of adejctives personal. I'm just trying to correct bad POS Headers whenever my wiki-parser spits them out; mainly in order to catch up with possible newly introduced POS headers. Matthias Buchmeier (talk) 16:18, 31 May 2018 (UTC)
- I'm not taking it personal. It's just that I always seem to type "adejctive" - seriously, it's really embarrassing that our fourth most prolific editor in history comes up with horsehit spelling mistakes. --Genecioso (talk) 17:47, 31 May 2018 (UTC)
- I think it's a typical typing error to to mess up the letter order of right and left hand. It also happens to me when I try to type fast. I guess it's because the hands are controled by different sides of the brain and there is a kind of communication bottleneck between the brain hemispheres. The clue to avoid those typos is to carefully check the result. I have to admit I'm often too lazy or distracted to realize my typos too. Matthias Buchmeier (talk) 18:45, 31 May 2018 (UTC)
- No worries, I'll correct yours and you can correct mine. Isn't that what wikis are all about, anyway? --Genecioso (talk) 13:55, 1 June 2018 (UTC)
- I think it's a typical typing error to to mess up the letter order of right and left hand. It also happens to me when I try to type fast. I guess it's because the hands are controled by different sides of the brain and there is a kind of communication bottleneck between the brain hemispheres. The clue to avoid those typos is to carefully check the result. I have to admit I'm often too lazy or distracted to realize my typos too. Matthias Buchmeier (talk) 18:45, 31 May 2018 (UTC)
- I'm not taking it personal. It's just that I always seem to type "adejctive" - seriously, it's really embarrassing that our fourth most prolific editor in history comes up with horsehit spelling mistakes. --Genecioso (talk) 17:47, 31 May 2018 (UTC)
- Hi WF, don't take my fixes of adejctives personal. I'm just trying to correct bad POS Headers whenever my wiki-parser spits them out; mainly in order to catch up with possible newly introduced POS headers. Matthias Buchmeier (talk) 16:18, 31 May 2018 (UTC)
Nikud at User:Matthias Buchmeier/en-he-a
editHi,
Atitarev (talk • contribs) showed me User:Matthias Buchmeier/en-he-a, which looks really useful!
If I may make a request . . . right now, when the translation is something like (e.g.) {{t+|he|הֶבֶל|m|tr=hevel}}
(which expands to “הֶבֶל (he) m (hevel)”), then User:Matthias Buchmeier/en-he-a links to הֶבֶל rather than הבל: it doesn't know to remove the vowel marks in the linked page-name. I'm guessing you don't want to extend your bot to understand all the diacritic-removal rules from Module:languages/alldata, but maybe you could just use the original wikitext that the entry had, and let the appropriate template ({{t}}
or {{t+}}
or whatnot) handle the linking?
Alternatively, if the current format has a specific significance, maybe at least use {{l|he|...}}
instead of [[...]]? ({{l}}
uses the same underlying modules as {{t}}
and {{t+}}
, so it can handle the diacritic-removal in a consistent way.)
Thanks in advance!
—RuakhTALK
07:03, 15 June 2018 (UTC)
- Hi. Another options are perhaps, no links at all, just "הֶבֶל" (because הֶבֶל is an incorrect link, הבל is correct) or use templates, which understand how to remove diacritics, e.g.
{{m|he|הֶבֶל}}
. The call הֶבֶל will link correctly to הבל#Hebrew. The same/similar approach would need to be given to the English-Arabic dictionary but with the correct language code. مِعَد (miʕad) is linked to معد#Arabic and مِعَد is an incorrect link. @Ruakh, Wikitiki89. --Anatoli T. (обсудить/вклад) 07:14, 15 June 2018 (UTC) - Hi Atitarev,
I know about those problems. Solving them is a bit tricky. I can't use {{l}}
because it will throw LUA errors (not enough memory due to too many template calls). For some languages I have run the lua template code locally on my PC with a standalone lua interpreter (which requires some minor code tweaks) in order to get automatic transliteration and IPA. I didn't try that out for Hebrew to remove the vocalization yet, but can do it as soon as I find some time. Do you know which templates will do this job? Do we have any lua template experts who have used them with standalone lua (without mediawiki)? Matthias Buchmeier (talk) 08:41, 15 June 2018 (UTC)
- The Lua method that handles this is Language:makeEntryName in 'Module:languages'. My bot that updates
{{t}}
/{{t+}}
does the following:- At the beginning of a run, to get the language data that makeEntryName uses:
- It prefetches https://en.wiktionary.org/w/api.php?format=json&action=expandtemplates&prop=wikitext&text=%7B%7B%23invoke%3AUser%3ARuakh%7CformatDiacriticRemovalRulesAsJson%7D%7D (meaning "expand templates in the wikitext string {{#invoke:User:Ruakh|formatDiacriticRemovalRulesAsJson}} and give me the result").
- It does JSON parsing, etc., as you'd expect.
- Since makeEntryName uses Lua patterns, the bot applies a bunch of rules to translate the Lua patterns to Perl regexes. (Actually it's mostly just a bunch of validation rules to make sure that the patterns are in the subset of patterns that behave the same in Lua as in Perl. The only actual conversion is changing %1 to \1.)
- Then, for each word that it needs to determine the entry-name for, it just applies the same logic that makeEntryName would. (Nothing magical here; I just reimplemented that method in Perl. Hopefully it hasn't changed since I did this . . .)
- At the beginning of a run, to get the language data that makeEntryName uses:
- Your idea of actually running the Lua locally seems brilliant, but I have no advice for how to manage that. :-P
- Since you don't actually do anything with the entry-name other than linking to the entry, one thing you could potentially try is to create a template that invokes makeEntryName, and then upload your result in pieces, using subst: to invoke that template just at upload-time. (Disclaimer: not tested.)
- —RuakhTALK 23:15, 15 June 2018 (UTC)
- Thanks a lot for explaining your approach to remove the diacritics. I'll try it out. I guess it's rather slowish, as you have to work over the internet with the the mediawiki servers. What I have done to run lua modules offline is copying the mediawiki lualib source from [phbricator https://phabricator.wikimedia.org/diffusion/ELUA/browse/master/includes/engines/LuaCommon/lualib/]. I had to make some minor changes like changing filenames und a soft link. Then I could load the lualib modulles from a standard lua stadalone interpreter (lua 5.1.5 on ubuntu). Some of the modules and lua functionallity still doesn't work, but the ustring functions which are the most important do. I'm no lua expert, so my guess is that a lua wizzard will figure things out easily and get the rest to work. If you want to have a look at the code, I've uploaded it at [10]. What I have working so far are some automatic transliteration (ru, bg, sh, ko) and pronunciation (es, pt, sh) modules interfaced with a bidirectional pipe. Matthias Buchmeier (talk) 12:46, 17 June 2018 (UTC)
- PS: Let me know if you manage to run Language:makeEntryName offline.
- Thanks a lot for explaining your approach to remove the diacritics. I'll try it out. I guess it's rather slowish, as you have to work over the internet with the the mediawiki servers. What I have done to run lua modules offline is copying the mediawiki lualib source from [phbricator https://phabricator.wikimedia.org/diffusion/ELUA/browse/master/includes/engines/LuaCommon/lualib/]. I had to make some minor changes like changing filenames und a soft link. Then I could load the lualib modulles from a standard lua stadalone interpreter (lua 5.1.5 on ubuntu). Some of the modules and lua functionallity still doesn't work, but the ustring functions which are the most important do. I'm no lua expert, so my guess is that a lua wizzard will figure things out easily and get the rest to work. If you want to have a look at the code, I've uploaded it at [10]. What I have working so far are some automatic transliteration (ru, bg, sh, ko) and pronunciation (es, pt, sh) modules interfaced with a bidirectional pipe. Matthias Buchmeier (talk) 12:46, 17 June 2018 (UTC)
- I've finally fixed the issue and removed the vocalization from Arabic and Hebrew. (Notifying Ruakh, Atitarev): please have a look and see if I've done OK.Matthias Buchmeier (talk) 14:14, 11 November 2018 (UTC)
- Great, thanks! I will check and let you know. --Anatoli T. (обсудить/вклад) 22:59, 11 November 2018 (UTC)
Hey MB. How's life, dude? So, I made hubiérase, which I probably shouldn't have wasted my time making. Anyway, it seems that Template:es-compound of can't handle a past subjunctive tense. Any way you can tweak the Template? I, let me remind you, am awful at templates. --Harmonicaplayer (talk) 07:53, 29 June 2018 (UTC)
- Oh, and habríale contains a conditional. That could be added into the Template too. --Harmonicaplayer (talk) 07:56, 29 June 2018 (UTC)
- These forms are passive voice, like véanse. I'll have a look at the the template and check how to implement past tense. Matthias Buchmeier (talk) 08:34, 29 June 2018 (UTC)
- I think these are only for compound tenses like pretérito pluscuamperfecto (hubiérase dicho), condicional perfecto (habríale dicho), so they only exist for forms of haber. I think it's not woth the pain to add all forms to Template:es-compound of. You can instead use Template:es-verb form of to specify the form and code the rest manually. Matthias Buchmeier (talk) 08:46, 29 June 2018 (UTC)
- OK, I manually wrote sth for habríale. It doesn't look like the best solution, but a wiki is wiki, as the saying goes. --Harmonicaplayer (talk) 16:59, 29 June 2018 (UTC)
- I think these are only for compound tenses like pretérito pluscuamperfecto (hubiérase dicho), condicional perfecto (habríale dicho), so they only exist for forms of haber. I think it's not woth the pain to add all forms to Template:es-compound of. You can instead use Template:es-verb form of to specify the form and code the rest manually. Matthias Buchmeier (talk) 08:46, 29 June 2018 (UTC)
- These forms are passive voice, like véanse. I'll have a look at the the template and check how to implement past tense. Matthias Buchmeier (talk) 08:34, 29 June 2018 (UTC)
'n Tag, Matthias. Schönen dank für das Russisch-Englische das Wörterbuch: User:Matthias_Buchmeier#Russian-English. You finally made it. There are minor issues, same as described in the Hebrew-Arabic topic above. The diacritics should be stripped from the links. So, in User:Matthias_Buchmeier/ru-en-b, it should be благозвучно, rather than благозву́чно, if adding templates is expensive, e.g. благозву́чно (blagozvúčno) links correctly to благозвучно#Russian.
(Notifying Benwing2, Cinemantique, KoreanQuoter, Useigor, Wanjuscha, Wikitiki89, Stephen G. Brown, Per utramque cavernam, Guldrelokk, Fay Freak): Advising the Russian language editors that the Russian-English offline Wiktionary is now available. The opposite User:Matthias_Buchmeier#English-Russian has been around for a long time. --Anatoli T. (обсудить/вклад) 04:20, 8 November 2018 (UTC)
- Thanks for the hint, I didn't think about the diacritics. They should be fixed now. Cheers Matthias Buchmeier (talk) 22:41, 8 November 2018 (UTC)
- Thanks! --Anatoli T. (обсудить/вклад) 22:44, 8 November 2018 (UTC)
Hi Matthias, I noticed you changed all the "transitive" labels to "intransitive." Are you sure? The definitions don't make sense as intransitive definitions. I got them from the Collins dictionary, as well as what was in the Wiktionary entry when I edited it. I doubt the Collins dictionary is wrong, so if this usage isn't normal, labels should be added to that effect rather than saying there are no transitive senses of arbeiten. But I'm not a native speaker, so I don't know enough to make any adjustments myself! Andrew Sheedy (talk) 17:20, 13 December 2018 (UTC)
- I'm very sure that arbeiten can't take any real accusative objects. The situation seems a bit different with the examples in the Collins dictionary, where the pronouns was and nichts are used. But you cannot ask the standard question for accusative objects Wen oder Was arbeitest du? with sounds completely wrong. So I guess that the Collins' examples don't count as transitive usage. That's also what the German wiktionary claims. If you keep these senses as transitive at least some examples and explation would have to be given, otherwise people will get it wrong and use {{|de|arbeiten}} with accusative object. Matthias Buchmeier (talk) 21:01, 13 December 2018 (UTC)
- OK, thanks. Would you be able to do that? My German isn't good enough that I'm comfortable providing examples. Also, note that the definitions might need modifying, as the verbs I used are generally transitive. Andrew Sheedy (talk) 23:47, 13 December 2018 (UTC)
- Thanks for the adjustments! Andrew Sheedy (talk) 20:23, 14 December 2018 (UTC)
qualcosa in Italiano
editHi, I noticed that you changed the gender of the Italian pronoun qualcosa to feminine. It does look feminine, but in modern usage it is generally masculine, see: garzanti or it:qualcosa. Were you thinking of a particular historical usage case would need to be taken into account when you changed it? Thanks. Isomorphyc (talk) 18:46, 28 January 2019 (UTC)
- No, I just added a headline-template and additionally commited a typo. I fixed it now.Matthias Buchmeier (talk) 21:17, 28 January 2019 (UTC)
- Thanks, I imagined it was a typo. I'm going to change it to masculine and leave an explanatory footnote, though, as I think the original m/f was probably a mistake. Isomorphyc (talk) 01:10, 29 January 2019 (UTC)
Spanish part-of-speech tagset
editHi, Thanks for the Bilingual Dictionaries for Offline Use!
Is there a list of what the part-of-speech tag abbreviations mean for the Spanish-English entries? Below is the set of the abbreviations.
set(['art' 'suffix' 'vi' 'affix' 'vp' 'vr' 'num' 'vt' 'interj' 'phrase' 'conj' 'adv' 'vit' 'vitr' 'vtp' 'vti' 'prop' 'pron' 'vtr' 'vtir' 'initialism' 'adj' 'prep' 'fp' 'vrt' 'acronym' 'symbol' 'abbr' 'letter' 'proverb' 'contraction' 'vir' 'mf' 'vtrp' 'determiner' 'particle' 'f' 'm' 'prefix' 'n' 'mp' 'v'])
Ccrowner (talk) 03:37, 12 February 2019 (UTC)CCrowner
- Hi Ccrowner, I don't have a list for the part of speech abbreviations yet but can compile it by Friday. Matthias Buchmeier (talk) 21:36, 13 February 2019 (UTC)
OK thanks!
- OK, here you are. I didn't yet spend much thought on the tagset, so feel free to suggest improvements. Matthias Buchmeier (talk) 15:19, 14 February 2019 (UTC)
74.77.51.166 22:03, 15 February 2019 (UTC) This is very helpful. I'm working with the data and will let you know regarding possible improvements. Thanks.
Commas in the Portuguese English dictionary data
editHi Matthias, Thanks for your work, it's brilliant. I downloaded your Portuguese English dictionary data and I'm in the process of converting this into an sqlite database using java. I have a problem with this because of your data structure. You have divided multiple Portuguese definitions for a single English word or phrase with a comma, but unfortunately some of these definitions already contain commas! So it becomes impossible to divide these correctly. Is it possible to change the divider to some other signal? Maybe an at-sign "@"?
Here is a very crude example of the problem:
fuck you {n} (fuck you) :: vai-te foder, fode-te, [very vulgar, offensive] foda-se, vai se foder, vá tomar no cu, tome no cu
In my program I check for commas as dividers between each definition. This results in the following definitions: 1) fuck you n fuck you vai-te foder 2) fuck you n fuck you fode-te 3) fuck you n fuck you [very vulgar 4) fuck you n fuck you offensive] foda-se 5) fuck you n fuck you vai se foder 6) fuck you n fuck you vá tomar no cu 7) fuck you n fuck you tome no cu
whereas normal (almost all) entries are like this: abacist {n} (One who uses an abacus.) :: calculador, abacista {m} {f} giving: 1) abacist n One who uses an abacus. calculador 2) abacist n One who uses an abacus. abacista m
I imagine that the data is produced by a shell script and if so, it should be a simple matter to change the comma to another unique character (such as @) not found in the data.
I think that the awk script at: https://en.wiktionary.org/wiki/User:Matthias_Buchmeier/trans-en-es.awk may contain this logic in producing the field DL. But it's just a bit complicated for me.
- Hi,
- there is actually little logic in the data-structure, because translations are plain text (with some wiki-formating and wiki- and lua-template functions), which have been added by the contributors. You can have a look at the translation's wiki-code by clicking on the edit button on the translations section headline on the page fuck you.
- Fortunately most translations now use templates, so I think it should be possible to change the delimiter-character between individual translations (now comma or semicolon) to some other character. However this will likely not work perfectly because some older translations still use plain text without templates.
- Let me see what I can do to solve the delimiter issue. Matthias Buchmeier (talk) 11:47, 15 February 2019 (UTC)
- OK, I've added an option (TransSep variable) to convert the delimiter to semicolon. This should work for most translations as semicolons are very rarely used in translations. Please check the updated dictionary I send you via email. I hope that it helps you with the import. Matthias Buchmeier (talk) 15:25, 15 February 2019 (UTC)
'Romanization'
editHey- thanks for your clean up work on all the mistakes I made by not adding the word "Romanization" to pinyin entries. Is there any way I can go through my history and systematically find all of the entries affected? --Geographyinitiative (talk) 11:17, 8 June 2019 (UTC)
- Yes, I have found these additional pages with missing part-of-speach headlines in the last data-dumb but didn't zet have the time to fix them: òuféi, kāipāi, Yuànlǐ, Zhuólán, Tóufèn, shāngquān, wéibǔ, Fú'ěrmóshā. Matthias Buchmeier (talk) 12:07, 8 June 2019 (UTC)
Hey MB. What's your purpose in life?
editInterested in English <-> Esperanto and Danish - English
editPlease include them :)--So9q (talk) 10:26, 2 September 2019 (UTC)
- I'll upload it with the next update. Matthias Buchmeier (talk) 05:34, 3 September 2019 (UTC)
- Thanks--So9q (talk) 09:47, 3 September 2019 (UTC)
Community Insights Survey
editShare your experience in this survey
Hi Matthias Buchmeier,
The Wikimedia Foundation is asking for your feedback in a survey about your experience with Wiktionary and Wikimedia. The purpose of this survey is to learn how well the Foundation is supporting your work on wiki and how we can change or improve things in the future. The opinions you share will directly affect the current and future work of the Wikimedia Foundation.
Please take 15 to 25 minutes to give your feedback through this survey. It is available in various languages.
This survey is hosted by a third-party and governed by this privacy statement (in English).
Find more information about this project. Email us if you have any questions, or if you don't want to receive future messages about taking this survey.
Sincerely,
Reminder: Community Insights Survey
editShare your experience in this survey
Hi Matthias Buchmeier,
A couple of weeks ago, we invited you to take the Community Insights Survey. It is the Wikimedia Foundation’s annual survey of our global communities. We want to learn how well we support your work on wiki. We are 10% towards our goal for participation. If you have not already taken the survey, you can help us reach our goal! Your voice matters to us.
Please take 15 to 25 minutes to give your feedback through this survey. It is available in various languages.
This survey is hosted by a third-party and governed by this privacy statement (in English).
Find more information about this project. Email us if you have any questions, or if you don't want to receive future messages about taking this survey.
Sincerely,
Reminder: Community Insights Survey
editShare your experience in this survey
Hi Matthias Buchmeier,
There are only a few weeks left to take the Community Insights Survey! We are 30% towards our goal for participation. If you have not already taken the survey, you can help us reach our goal! With this poll, the Wikimedia Foundation gathers feedback on how well we support your work on wiki. It only takes 15-25 minutes to complete, and it has a direct impact on the support we provide.
Please take 15 to 25 minutes to give your feedback through this survey. It is available in various languages.
This survey is hosted by a third-party and governed by this privacy statement (in English).
Find more information about this project. Email us if you have any questions, or if you don't want to receive future messages about taking this survey.
Sincerely,
German past participles error
editHi Matthias, thanks for providing these useful dictionaries.
The latest update of the de-en dictionary has an error: All past participles say "past participle of de" (where "de" is written instead of the lemma form of the verb).
Example at https://en.wiktionary.org/wiki/User:Matthias_Buchmeier/de-en-a
antizipiert {v} :: past participle of de
Instead of
antizipiert {v} :: past participle of antizipieren
Octonon (talk) 00:13, 8 December 2019 (UTC)
- Thanks a lot for the bug-report. The problem also occurs for alternative spellings:
- abbeissen {v} :: alternative spelling of de
- It's a result of a changed template syntax. I'll fix the bug in the next version. Matthias Buchmeier (talk) 10:40, 8 December 2019 (UTC)
Add synonym support
editThank you for your wonderful scripts. I see "include synonyms and antonyms (syn and ant template)" on the to-do list, this would be a very useful feature and I will be very happy when it's added.
- Thanks for the feedback. Do you have any suggestions how to format the synonyms? Matthias Buchmeier (talk) 08:43, 27 April 2020 (UTC)
- Ideally they would be appended to the definition for which they're listed as synonyms, using some sort of separator that would be unlikely to be used in the definition. Reusing :: seems like an obvious choice but could break existing scripts that assumes it only occurs once per line. ~~ looks pretty and is easy to strip off for anyone who doesn't want the synonyms.
- With that addition, the definitions for spanish "Coger" would become:
coger {v} [Colombia, Cuba, Dominican Republic, Philippines, Puerto Rico, Spain] :: to take, catch, hold, get ~~ agarrarse, tomar, prender, asir coger {v} :: to pick, harvest ~~ cosechar, recolectar coger {v} :: to fish ~~ pescar, atrapar coger {v} :: to seize, arrest; to overtake ~~ atrapar, aprehender, capturar coger {v} :: to get (a joke) coger {v} [Spain] :: to imitate, learn coger {v} [vulgar, Argentina, Chile, Paraguay, Mexico, Central America] :: to have sex ~~ joder coger {v} [Spain] :: to choose (a direction, route, when driving or walking) ~~ manejar, canducir, irse coger {v} [Spain] :: to turn to (when driving or walking) ~~ dar vuelta coger {v} [Spain] :: to board (means of transportation)
- I agree that synonyms should be easily stripped and machine readable. But the dictionary should also remain human readable. Also, logically per its language the synonyms belong to the left side of the doble column. What about:
coger {v} | cosechar, recolectar :: to pick, harvest
- or
coger {v} |SYN: cosechar, recolectar :: to pick, harvest
- Both are also compatible with the ding dictionary browser. Matthias Buchmeier (talk) 16:55, 1 May 2020 (UTC)
- Both are completely fine by me. I'd go with | just because it saves 4 bytes per line and looks less cluttered than |SYN:
- Both are also compatible with the ding dictionary browser. Matthias Buchmeier (talk) 16:55, 1 May 2020 (UTC)
- I am working on the feature and hope to finish it by next weekend. Matthias Buchmeier (talk) 12:58, 5 May 2020 (UTC)
- The feature is ready. You can download a precompiled Spanish-English dictionary with synonyms from here Matthias Buchmeier (talk) 13:30, 10 May 2020 (UTC)
- Thank you for taking the time to do this, your script is a wonderful resource!
- The feature is ready. You can download a precompiled Spanish-English dictionary with synonyms from here Matthias Buchmeier (talk) 13:30, 10 May 2020 (UTC)
Bug in script
editIt seems there's a bug in the latest revision of the script. If you use ENABLE_SYN, the verb type accumulates with each definition:
tirar {vt} | lanzar; arrojar; botar :: to throw, to toss tirar {vtt} | echar; botar :: to throw out, to toss tirar {vttt} :: to shoot; to launch tirar {vtttt} | hacer; tomar :: to take (a photograph) tirar {vttttt} | imprimir :: to print tirar {vtttttt} :: to skip (e.g. a rock or stone) tirar {vttttttt} | derribar :: to knock over; to knock down tirar {vtittttttti} :: to roll (dice)
The issue seems to be in replace_template(). Maybe it's getting called repeatedly with the synonym stuff?
function replace_template(tpar, n_unnamed, outp, i, j, start)>{
[snip]
# there might be labels on the headline
if((headline == 1)&&(pos=="v")) gend = gend gend2;
JeffDoozan (talk) 16:28, 17 May 2020 (UTC)
- Thanks for the bug report. I've just fixed it. Matthias Buchmeier (talk) 17:33, 17 May 2020 (UTC)
Dictionaries extracted from Translations Sections - new requests
editHi,
Are you able to add more languages to the list? Please consider, if it's OK. I particular, I am interested in having Belarusian and Ukrainian. The coverage is not great yet but your lists would help us see the volumes in growth, important red links to be filled and bad or badly formatted translations. Please ping, so that I don't miss your reply when you have a chance to respond. --Anatoli T. (обсудить/вклад) 11:07, 13 September 2020 (UTC)
- @Atitarev Sure, dictionaries from translations sections are always easy apart from automatic transcriptions. Dictionaries from non-English language sections are fairely easy except for languages with complicated scrips like Japanese or Arabic. I'll see if I manage to add Belarus and Ukrainian by next weekend.Matthias Buchmeier (talk) 15:57, 13 September 2020 (UTC)
- Thank you! Please use the same methods for accent stripping as you do with Russian. Notifying @Benwing2, PUC. --Anatoli T. (обсудить/вклад) 00:51, 14 September 2020 (UTC)
- @Atitarev There's to be at least one extra vowel (і) from which accents have to be stripped. Any other vowels which don't exist in russian and need accent stripping? Matthias Buchmeier (talk) 17:06, 19 September 2020 (UTC)
- Yes, also Ukrainian є. --Anatoli T. (обсудить/вклад) 22:02, 19 September 2020 (UTC)
@Atitarev Is there any simple way of getting the gender and animacy from the parameters of headline templates uk-noun and be-noun? Matthias Buchmeier (talk) 16:54, 20 September 2020 (UTC)
- I am not sure about gender/animacy parameter. They are unnamed. @Benwing2 is the primary developer of the modules. I think he might help.
- I have missed one more letter - Ukrainian ї. Here are full lists of vowels with accents to be stripped from translations:
- Belarusian: А́, а́, Е́, е́, І́, і́, О́, о́, У́, у́, Ы́, ы́, Э́, э́, Ю́, ю́, Я́, я́
- Bulgarian: А́, а́, Е́, е́, И́, и́, О́, о́, У́, у́, Ъ́, ъ́, Ю́, ю́, Я́, я́
- Russian: А́, а́, Е́, е́, И́, и́, О́, о́, У́, у́, Ы́, ы́, Э́, э́, Ю́, ю́, Я́, я́
- Ukrainian: А́, а́, Е́, є́, Є́, е́, И́, и́, І́, і́, Ї́, ї́, О́, о́, У́, у́, Ю́, ю́, Я́, я́
- You will find that there are also some archaic vowels used, e.g. in WT:RU TR, which we normally don't use in translations.
- Thank you for adding the offline dictionaries! --Anatoli T. (обсудить/вклад) 23:00, 20 September 2020 (UTC)
- @Matthias Buchmeier, Atitarev If you call
{{uk-generate-noun-args|...}}
or{{be-generate-noun-args|...}}
, where...
is the same parameters passed to{{uk-noun}}
or{{be-noun}}
respectively, you will get back a string containing all the inflected forms as well as the gender/number/animacy, e.g.n-in
orm-pr-p
(pr
= "personal"). The way to parse the resulting string is to first split on|
, then split each element on=
to get name, value pairs, then replace<!>
with|
and<->
with=
in the resulting values, then finally split on,
. Here is some Python code to do everything but the final split on comma:
- @Matthias Buchmeier, Atitarev If you call
def split_generate_args(tempresult): args = {} for arg in re.split(r"\|", tempresult): name, value = arg.split("=") value = value.replace("<!>", "|").replace("<->", "=") # With manually specified declensions, we get back "-" for unspecified # forms, which need to be omitted; otherwise they're automatically omitted. if value != "-": args[name] = value return args
- Benwing2 (talk) 23:15, 20 September 2020 (UTC)
- BTW for automatic transliteration you should just invoke
{{xlit}}
to do the work rather than trying to implement it yourself. This will work for all but maybe Chinese, Japanese and Korean, which seem to have specialized versions of most templates. Benwing2 (talk) 23:23, 20 September 2020 (UTC)- @Benwing2 That's the best solution. However, I had trouble getting some of the templates to work offline because of heavy transclusions and use of wikimedia functions. Do you know of anyone else using Wikimedia-Lua functions offline? Matthias Buchmeier (talk) 05:43, 21 September 2020 (UTC)
- @Matthias Buchmeier When you say "offline" what are you referring to? I use pywikibot
site.expand_text()
to expand an arbitrary template. This works for all templates, because it calls Wiktionary itself to do the template expansion; you don't have to do any extra work to get this to work. Benwing2 (talk) 05:53, 21 September 2020 (UTC)- Under the hood, I think there's an API you can use to accomplish the same thing. Benwing2 (talk) 05:54, 21 September 2020 (UTC)
- I run the lua scripts locally from my harddrive. I didn't know of site.expand_text(). I'll try it out, but I think this approach could be quite slow when expanding thousands of templates. Matthias Buchmeier (talk) 06:02, 21 September 2020 (UTC)
- Under the hood, I think there's an API you can use to accomplish the same thing. Benwing2 (talk) 05:54, 21 September 2020 (UTC)
- @Matthias Buchmeier When you say "offline" what are you referring to? I use pywikibot
- @Benwing2 That's the best solution. However, I had trouble getting some of the templates to work offline because of heavy transclusions and use of wikimedia functions. Do you know of anyone else using Wikimedia-Lua functions offline? Matthias Buchmeier (talk) 05:43, 21 September 2020 (UTC)
- BTW for automatic transliteration you should just invoke
- Benwing2 (talk) 23:15, 20 September 2020 (UTC)
Wiktionary dumps in git?
editHi, thanks for making the downloadable dictionaries available! Have you considered putting them in a git repository somewhere? (At least the ones in ding format, I don't know if the other formats are plain text or binary, and the latter wouldn't make so much sense). This way, it would be easier for the user to update their files, and it should even be possible to have a periodic CI job update the git repo with the latest Wiktionary data. IjIzaR3KoSmYq (talk) 20:19, 13 January 2021 (UTC)
- Thanks for the feature request. Can you recommend a free git hoster? Matthias Buchmeier (talk) 07:03, 15 January 2021 (UTC)
- The standard choices are https://github.com/ and https://gitlab.com/ IjIzaR3KoSmYq (talk) 12:55, 16 January 2021 (UTC)
New languages
editHi Matthias, thank you for your longstanding work on these dictionaries. I’ve been using them for almost a decade! Would you consider adding the following language pairs?
- English-Slovak (en-sk)
- English-Slovene (en-sl)
- English-Bengali (en-bn)
- English-Tamil (en-ta)
- English-Marathi (en-mr)
- English-Telugu (en-te)
- English-Kannada (en-kn)
- English-Gujarati (en-gu)
- English-Malayalam (en-ml)
- English-Urdu (en-ur)
Please let me know if there’s anything I can do to help. —NovaPatch (talk) 00:13, 27 July 2021 (UTC)
How we will see unregistered users
editHi!
You get this message because you are an admin on a Wikimedia wiki.
When someone edits a Wikimedia wiki without being logged in today, we show their IP address. As you may already know, we will not be able to do this in the future. This is a decision by the Wikimedia Foundation Legal department, because norms and regulations for privacy online have changed.
Instead of the IP we will show a masked identity. You as an admin will still be able to access the IP. There will also be a new user right for those who need to see the full IPs of unregistered users to fight vandalism, harassment and spam without being admins. Patrollers will also see part of the IP even without this user right. We are also working on better tools to help.
If you have not seen it before, you can read more on Meta. If you want to make sure you don’t miss technical changes on the Wikimedia wikis, you can subscribe to the weekly technical newsletter.
We have two suggested ways this identity could work. We would appreciate your feedback on which way you think would work best for you and your wiki, now and in the future. You can let us know on the talk page. You can write in your language. The suggestions were posted in October and we will decide after 17 January.
Thank you. /Johan (WMF)
18:14, 4 January 2022 (UTC)
What happened to your page?
editWhere did all those dictionaries go? Are they lost now forever? 82.40.96.28 13:47, 12 August 2022 (UTC)
- No, it's just the template, that I used to format the indexes that was deleted. I'll see if I find time to fix that by tomorrow.Matthias Buchmeier (talk) 19:59, 12 August 2022 (UTC)
Well buddy I don't understand at least three of the words you just said, and I speak English! But thanks for taking the time to respond. Those dictionaries are really useful. 👍 Peace
Please review a newcomer's edit
editHello :)
I've a made an edit in a page originally created by you: https://en.wiktionary.org/w/index.php?title=c%C3%A2mera&oldid=prev&diff=78476272
I've been a sporadic editor in Wikipedia, but this is my first edit in Wiktionary ever. Since you're a more experienced editor, could you please take a look to make sure it meets Wiktionary quality standards?
Thank you very much :)
Spanish Frequency Dictionary
editHi, I am using the Spanish Frequency List to assist with my linguistics thesis. (Thanks so much for that, BTW!) I was wondering if you had any additional information on the 6527 TV-series and movies used to generate the list. I'm particularly interested in the regions/specific dialectical makeup of the shows/movies. Thank you! Dawn Loggins (talk) 20:58, 12 April 2024 (UTC)
Admin rights
editHi, I have removed your admin rights due to our policy on admin inactivity, as you have not used any admin tools in the past five years. This removal is without prejudice and you can request your admin rights to be restored at any time. — SURJECTION / T / C / L / 11:56, 29 August 2024 (UTC)