模組:Language/name/data/iana data extraction tool

维基百科,自由的百科全书
文档图示 模块文档[查看] [编辑] [历史] [清除缓存]

This is a crude tool that reads a local copy of a iso-639-3_Name_Index_YYYYMMDD.tab file from sil.org and extracts the information necessary to create the data table held by Module:Language/data/ISO_639-3

使用 

要使用這個工具:

  1. 打開一個新的沙盒分頁並黏貼這個{{#invoke:}}到該頁面的第一行;
    {{#invoke:Language/name/data/ISO 639-3 data extraction tool|ISO_639_3_extract|file-date=YYYYMMDD}}
    YYYYMMDD分別是年、月、日,來自.tab 文件名 (used to place a file-date comment in Module:Language/data/ISO_639-3)
  2. 下載完整的 Code Tables Set UTF-8 version zip file
  3. 解壓縮 iso-639-3_Name_Index_YYYYMMDD.tab 並用純文本編輯器打開
  4. 將其中的資料複製並貼在沙盒分頁的{{#invoke:}}下方
  5. 點擊[顯示預覽]
  6. 等待
  7. 得到結果

有一些粗略的錯誤檢查將在輸出中插入錯誤消息。 不保證這種信息會有幫助。 在工具的輸出中搜索“錯誤”一詞。

require('Module:No globals');
local p = {};

--[=[------------------------< I S O _ 6 3 9 _ 3 _ E X T R A C T >---------------------------------------------

{{#invoke:Language/name/data/ISO 639-3 data extraction tool|ISO_639_3_extract|file-date=20170217}}

reads a local copy of iso-639-3_Name_Index_YYYYMMDD.tab where (YYYYMMDD is the release date).  Download that file
in zip form from http://www-01.sil.org/iso639-3/download.asp (use the UTF-8 zip)

useful lines in the file have the form:
	<id>\t<name>\t<inverted name>\n
where:
	<id> is the three-character ISO 639-3 language code
	<name> is the language 'name'
	<inverted name> is the language in  'last-name, first-name(s)' form; this part ignored
	
	like this:
		aaq	Eastern Abnaki	Abnaki, Eastern

when a language code has more than one name, the code is repeated for each additional name:
	rar	Cook Islands Maori	Maori, Cook Islands
	rar	Rarotongan	Rarotongan

]=]

function p.ISO_639_3_extract (frame)
	local page = mw.title.getCurrentTitle();									-- get a page object for this page
	local content = page:getContent();											-- get unparsed content
	local lang_table = {};														-- languages go here

	local code;
	local names;

	local file_date = 'File-Date: ' .. frame.args["file-date"];									-- set the file date line from |file-date=

	for code, name in mw.ustring.gmatch (content, '%f[%a](%a%a%a)\t([^\t]+)\t[^\n]+\n') do		-- get code and 'forward' name
		if code then
			if string.find (lang_table[#lang_table] or '', '^%[\"' .. code) then				-- if this is an additional name for code ('or' empty string for first time when lang_table[#lang_table] is nil)
				lang_table[#lang_table] = mw.ustring.gsub (lang_table[#lang_table], '}$', '');	-- remove trailing brace from previous name
				lang_table[#lang_table] = lang_table[#lang_table] .. ', \"' .. name .. '\"}';	-- add this name with new brace 
			else
				table.insert (lang_table, "[\"" .. code .. "\"] = {\"" .. name .. "\"}");		-- make new table entry
			end
		elseif not code then
			table.insert (lang_table, "[\"error\"] = {" .. record .. "}");						-- code should never be nil, but inserting an error entry in the final output can be helpful
		end
	end
																				-- make pretty output
	return "<br /><pre>-- " .. file_date .. "<br />return {<br />&#9;" .. table.concat (lang_table, ',<br />&#9;') .. "<br />&#9;}<br />" .. "</pre>";
end

return p;