Translating another language is another puzzle but probably google translate has an API you can hit which would allow that to be implemented programmatically.
Here’s the Elderly paper translated to a raw txt format using Google tesseract
The txt file could be reformatted back to a pdf for easier reading and presumably a non-english paper could be translated sentence by sentence after parsing the text from the pdf.
This script below is pulled directly from the net, I just pasted in the name of the PDF and setup my Debian based OS to be able to run it.
# Import libraries
from PIL import Image
import pytesseract
import sys
from pdf2image import convert_from_path
import os
# Path of the pdf
PDF_file = "EDERY 1972.pdf"
'''
Part #1 : Converting PDF to images
'''
# Store all the pages of the PDF in a variable
pages = convert_from_path(PDF_file, 500)
# Counter to store images of each page of PDF to image
image_counter = 1
# Iterate through all the pages stored above
for page in pages:
# Declaring filename for each page of PDF as JPG
# For each page, filename will be:
# PDF page 1 -> page_1.jpg
# PDF page 2 -> page_2.jpg
# PDF page 3 -> page_3.jpg
# ....
# PDF page n -> page_n.jpg
filename = "page_"+str(image_counter)+".jpg"
# Save the image of the page in system
page.save(filename, 'JPEG')
# Increment the counter to update filename
image_counter = image_counter + 1
'''
Part #2 - Recognizing text from the images using OCR
'''
# Variable to get count of total number of pages
filelimit = image_counter-1
# Creating a text file to write the output
outfile = "out_text.txt"
# Open the file in append mode so that
# All contents of all images are added to the same file
f = open(outfile, "a")
# Iterate from 1 to total number of pages
for i in range(1, filelimit + 1):
# Set filename to recognize text from
# Again, these files will be:
# page_1.jpg
# page_2.jpg
# ....
# page_n.jpg
filename = "page_"+str(i)+".jpg"
# Recognize the text as string in image using pytesserct
text = str(((pytesseract.image_to_string(Image.open(filename)))))
# The recognized text is stored in variable text
# Any string processing may be applied on text
# Here, basic formatting has been done:
# In many PDFs, at line ending, if a word can't
# be written fully, a 'hyphen' is added.
# The rest of the word is written in the next line
# Eg: This is a sample text this word here GeeksF-
# orGeeks is half on first line, remaining on next.
# To remove this, we replace every '-\n' to ''.
text = text.replace('-\n', '')
# Finally, write the processed text to the file.
f.write(text)
# Close the file after writing all the text.
f.close()
Deepl.com is the best translation engine I know off. Translates whole pdfs and word documents while keeping the formatting. A trial pro account does it all and they have an app. Specializing in European languages.
Sci-hub (https://sci-hub.se/DOI-HERE) typically has access to most papers. Sometimes very recent papers may not be accessible (like those published the same month as the sci-hub search). If it doesn’t have login credentials for specific journals, there are other ways to get the papers.
I read this paper in 2012 and have a copy in my drive. Unfortunately, it’s not super impressive.
The study @pdxcanna just asked for isn’t available on Sci-hub. Either it’s too new, or sci-hub doesn’t have the login credentials to access the journal.
So, in that case, a good trick is to search for the paper in Google Scholar, then click “All x downloads” to see every link in Google Scholar’s database. In many cases, one of those links will be or bring you to the full text.
Another thing to try is searching on ResearchGate and europepmc.org directly. Europe MC is an archive of life sciences journal literature; sometimes, they have full-text available.
Here is how I found that study for @pdxcanna using Google Scholar, which isn’t on Sci-hub right now:
1. Search on Google Scholar for the title for DOI. Then click on the “All x versions” link:
Also, in the case of the paper @pdxcanna asked for, the full-text link was on the PubMed link he shared. PubMed Central (PMC) by US NIH and other journal literature archive services (like Europe PMC, PLOS ONE, etc.) will often provide links to the free full-text of the paper:
Try sci-hub.st to find papers. If they don’t have it, you can request papers by providing them with link.
I usually upload my papers on researchgate. I hate the what these publishing companies do. They don’t pay anything to the researchers who does all the hard work. And then they sell that paper, making money off someone else’s work. I even get emails to review papers for free. And I’m like what are you guys doing then?