Warning: this is an htmlized version!
The original is here, and the conversion rules are here. |
_____ _ _ | ___| |__ ___ __ _ ___| |__ ___ | |_ | '_ \ / __/ _` |/ __| '_ \ / _ \ | _| | |_) | (_| (_| | (__| | | | __/ |_| |_.__/ \___\__,_|\___|_| |_|\___| http://angg.twu.net/fbcache/README.html http://angg.twu.net/fbcache/fbcache.lua.html (current version) http://angg.twu.net/fbcache/urls.lua.html (previous version) By: Eduardo Ochs <[email protected]> Version: 2014nov11 License: GPL3 Contributors: Iury Oliveira Pedro Nascimento Srinivas Mangipudi Fbcache is library for keeping a cache of interesting Facebook posts. More precisely, it is a set of functions that convert between Facebook URLs (permalinks) and names of local files in the cache; between Facebook URLs and IDs used internally by Facebook; that connects to Facebook with a subset of your user's permissions to download posts readable by you; and that converts these downloaded posts, which are in JSON, to text. The main operations that fbcache performs can be summarized in this diagram: fburl --> fbid --> json --> pplua --> txt --> html Fburls ====== A "fburl" is permalink like these: https://www.facebook.com/jean.wyllys/photos/a.201340996580582.48122.163566147024734/715611308486879/ https://www.facebook.com/CopBlock/posts/10152706679833189 Fbids ===== A "fbid" is one of the sequencies of digits in a fburl; which one depends of the kind of URL. I currently know at least 22 different kinds of Facebook URLs, and for some of them I know which part is the fbid, while for some other kinds I don't (see the section "problems" below). For the fburls above, their fbids are: 715611308486879 10152706679833189 Json ==== When we have a fbid and a valid access token for Facebook we can fetch from Facebook the (current) content of that fbid; that fetching is done via HTTP using Facebook's "Graph API", which returns an object in JSON format. That fetching is not something that we want to redo all the time - because it can take several seconds, and posts can be deleted, our access token may expire, and so on - so we cache these JSONs by saving them into a cache directory. Pplua ===== All further processing of Facebook posts by fbcache depends on processing that data in JSONs. To speed up things, when we fetch a JSON we parse it to generate a Lua table with exactly the same structure, and we pretty-print the contents of that table and save that into a file into another cache directory. That pretty-printed table in Lua syntax ("pplua") can be read back by Lua by using just an "eval", and is reasonably human-friendly. Txt === The "txt version" of a Facebook post is produced from a Facebook object by extracting the text of _some_ of its fields - sender, date, message, etc - and formatting that into a plain text format which is (mostly) 70-columns wide. Many formats of txt output are possible, and right now I am using a single very simplistic one. The main ideas here are: * once we have a cache of pplua objects producing txts from them is VERY fast, * new txt formats are easy to add. Html ==== Many txt formats are possible, and many HTML formats are possible too... but right now I am just generating a single huge HTML as a demo - here's how. This is my list of "interesting Facebook urls": http://angg.twu.net/fbcache/all-fb-urls.lst.html I produced it by just grepping all things that look like Facebook urls from a handful of files, and sorting them alphabetically. Here is the "single huge HTML" produced from it: http://angg.twu.net/fbcache/huge.txt.html#715611308486879 http://angg.twu.net/fbcache/huge.txt.html#10152706679833189 The posts in it are ordered chronologically, with the older (and the undated) posts at the top. The "#digits" parts in the urls jump to specific posts. Notes about the innards (for a rewrite) ======================================= Low-level functions ------------------- fburl_to_kparts (fburl) return kind, parts fburl_to_fbids (fburl) return fbids_array fbid_fetch_from_facebook (fbid) return json json_parse (json) return o fbid_write_json (fbid, json) fbid_read_json (fbid) return json fbid_write_pplua (fbid, pplua) fbid_read_pplua (fbid) return pplua o_to_txt (o) return txt fburl_clean (fburl) return fburl fburls_in_string (bigstr) return fburls pplua_ls () return fbids json_ls () return fbids Arrays, tables, big strings --------------------------- fbkinds0 fbkinds fbid_to_o fbid_to_fburls fburls fbids Update (2014nov11): =================== Here's the rewrite of Fbcache, using the names above... http://angg.twu.net/fbcache/fbcache.lua.html http://angg.twu.net/fbcache/fbcache.lua Old notes ========= # (find-es "facebook" "google-groups-1") # (find-es "facebook" "google-groups-2") I spent months working on this - an archive of videos on youtube whose links I received through facebook - and the script that I use to maintain that archive has just become easy to install (on *NIXes and OSX) and to use. Links: http://angg.twu.net/linkdasruas2.html http://angg.twu.net/youtube-db/README.html Now I am trying to implement something similar for text. The thing is that I live in Brazil, and people here are using facebook to create very improvised alternative media channels... most people here get their news only through the corporate media, that only presents a whitewash of a tiny fraction of what is going on, and we are looking for ways to make the "alternative news" more available. Facebook has lots of problems, and in this case the two most glaring ones are: 1) only a tiny part of what we post is redistributed, 2) history is hard to dig for - if our friend Jane Doe has posted something very interesting three days ago it is hard to locate that in her timeline to show it to someone else. My idea is: when you find an interesting post on facebook, copy its *PERMALINK* - possibly with a tag and a comment - to a file you're editing with a text editor, where you are putting all those permalinks. Let me call this file "NOTES". Right now I am only using the tags '[ee]', for things related to "state of exception" ("estado de exceção" - political prisoners and the lifting of constitutional guarantees) and '[is]' for "Israel" (btw, the situation in Gaza is similar to the one in our slums). With grep, sort and a few other tricks it is easy to produce listings like these, http://angg.twu.net/ee.html http://angg.twu.net/links-about-gaza.html in which I can _sometimes_ find what I am looking for by searching for keywords and for red stars. The next stage, which is where fbcmd comes in, is to have local copies of the text of each one of the facebook links in these listings. With those local copies several tasks can become easy - for example, searching, cut-and-paste, and producing HTML pages outside of facebook with the full text (and links to the orginal posts!) of relevant posts. # Local Variables: # coding: raw-text-unix # End: