fbcache/README (htmlized)

Warning: this is an htmlized version!
The original is here, and
the conversion rules are here.
   _____ _                    _          
  |  ___| |__   ___ __ _  ___| |__   ___ 
  | |_  | '_ \ / __/ _` |/ __| '_ \ / _ \
  |  _| | |_) | (_| (_| | (__| | | |  __/
  |_|   |_.__/ \___\__,_|\___|_| |_|\___|

  http://angg.twu.net/fbcache/README.html
  http://angg.twu.net/fbcache/fbcache.lua.html (current version)
  http://angg.twu.net/fbcache/urls.lua.html   (previous version)
  By: Eduardo Ochs <[email protected]>
  Version: 2014nov11
  License: GPL3
  Contributors:
    Iury Oliveira 
    Pedro Nascimento
    Srinivas Mangipudi


Fbcache is library for keeping a cache of interesting Facebook posts.
More precisely, it is a set of functions that convert between Facebook
URLs (permalinks) and names of local files in the cache; between
Facebook URLs and IDs used internally by Facebook; that connects to
Facebook with a subset of your user's permissions to download posts
readable by you; and that converts these downloaded posts, which are
in JSON, to text.

The main operations that fbcache performs can be summarized in this
diagram:

  fburl --> fbid --> json --> pplua --> txt --> html


Fburls
======
A "fburl" is permalink like these:

  https://www.facebook.com/jean.wyllys/photos/a.201340996580582.48122.163566147024734/715611308486879/
  https://www.facebook.com/CopBlock/posts/10152706679833189


Fbids
=====
A "fbid" is one of the sequencies of digits in a fburl; which one
depends of the kind of URL. I currently know at least 22 different
kinds of Facebook URLs, and for some of them I know which part is the
fbid, while for some other kinds I don't (see the section "problems"
below). For the fburls above, their fbids are:

  715611308486879
  10152706679833189


Json
====
When we have a fbid and a valid access token for Facebook we can fetch
from Facebook the (current) content of that fbid; that fetching is
done via HTTP using Facebook's "Graph API", which returns an object in
JSON format.

That fetching is not something that we want to redo all the time -
because it can take several seconds, and posts can be deleted, our
access token may expire, and so on - so we cache these JSONs by saving
them into a cache directory.


Pplua
=====
All further processing of Facebook posts by fbcache depends on
processing that data in JSONs. To speed up things, when we fetch a
JSON we parse it to generate a Lua table with exactly the same
structure, and we pretty-print the contents of that table and save
that into a file into another cache directory. That pretty-printed
table in Lua syntax ("pplua") can be read back by Lua by using just an
"eval", and is reasonably human-friendly.



Txt
===
The "txt version" of a Facebook post is produced from a Facebook
object by extracting the text of _some_ of its fields - sender, date,
message, etc - and formatting that into a plain text format which is
(mostly) 70-columns wide. Many formats of txt output are possible, and
right now I am using a single very simplistic one. The main ideas here
are:

  * once we have a cache of pplua objects producing txts from them is
    VERY fast,

  * new txt formats are easy to add.



Html
====
Many txt formats are possible, and many HTML formats are possible
too... but right now I am just generating a single huge HTML as a
demo - here's how.

This is my list of "interesting Facebook urls":

  http://angg.twu.net/fbcache/all-fb-urls.lst.html

I produced it by just grepping all things that look like Facebook urls
from a handful of files, and sorting them alphabetically.

Here is the "single huge HTML" produced from it:

  http://angg.twu.net/fbcache/huge.txt.html#715611308486879
  http://angg.twu.net/fbcache/huge.txt.html#10152706679833189

The posts in it are ordered chronologically, with the older (and the
undated) posts at the top. The "#digits" parts in the urls jump to
specific posts.







Notes about the innards (for a rewrite)
=======================================

Low-level functions
-------------------
fburl_to_kparts  (fburl)         return kind, parts
fburl_to_fbids   (fburl)         return fbids_array

fbid_fetch_from_facebook (fbid)  return json

json_parse       (json)          return o
fbid_write_json  (fbid, json)
fbid_read_json   (fbid)          return json
fbid_write_pplua (fbid, pplua)
fbid_read_pplua  (fbid)          return pplua

o_to_txt         (o)             return txt

fburl_clean      (fburl)         return fburl
fburls_in_string (bigstr)        return fburls
pplua_ls         ()              return fbids
json_ls          ()              return fbids

Arrays, tables, big strings
---------------------------
fbkinds0
fbkinds
fbid_to_o
fbid_to_fburls
fburls
fbids

Update (2014nov11):
===================
Here's the rewrite of Fbcache, using the names above...
  http://angg.twu.net/fbcache/fbcache.lua.html
  http://angg.twu.net/fbcache/fbcache.lua











Old notes
=========
# (find-es "facebook" "google-groups-1")
# (find-es "facebook" "google-groups-2")

I spent months working on this - an archive of videos on youtube whose
links I received through facebook - and the script that I use to
maintain that archive has just become easy to install (on *NIXes and
OSX) and to use. Links:

  http://angg.twu.net/linkdasruas2.html
  http://angg.twu.net/youtube-db/README.html

Now I am trying to implement something similar for text. The thing is
that I live in Brazil, and people here are using facebook to create
very improvised alternative media channels... most people here get
their news only through the corporate media, that only presents a
whitewash of a tiny fraction of what is going on, and we are looking
for ways to make the "alternative news" more available. Facebook has
lots of problems, and in this case the two most glaring ones are: 1)
only a tiny part of what we post is redistributed, 2) history is hard
to dig for - if our friend Jane Doe has posted something very
interesting three days ago it is hard to locate that in her timeline
to show it to someone else.

My idea is: when you find an interesting post on facebook, copy its
*PERMALINK* - possibly with a tag and a comment - to a file you're
editing with a text editor, where you are putting all those
permalinks. Let me call this file "NOTES".

Right now I am only using the tags '[ee]', for things related to
"state of exception" ("estado de exceção" - political prisoners
and the lifting of constitutional guarantees) and '[is]' for "Israel"
(btw, the situation in Gaza is similar to the one in our slums). With
grep, sort and a few other tricks it is easy to produce listings like
these,

  http://angg.twu.net/ee.html
  http://angg.twu.net/links-about-gaza.html

in which I can _sometimes_ find what I am looking for by searching for
keywords and for red stars.

The next stage, which is where fbcmd comes in, is to have local copies
of the text of each one of the facebook links in these listings. With
those local copies several tasks can become easy - for example,
searching, cut-and-paste, and producing HTML pages outside of facebook
with the full text (and links to the orginal posts!) of relevant
posts.






# Local Variables:
# coding: raw-text-unix
# End: